How to Analyze Sequenced Video Game Music

This document introduces my generic analysis method for sequenced video game music. I hope this will help your research.

日本語に翻訳する

Prerequisites

The following knowledge is necessary to understand this document.

  • Bit, byte, endianness
  • Decimal number, hexadecimal number (hex), two's complement
  • Basic knowledge about MIDI
    • How to view/create a MIDI file by using MIDI editor/sequencer
    • Concept of a track and a channel

The following knowledge is not a must, but it will help you to understand this document more deeply.

  • Specification of Standard MIDI File (SMF)
  • Hardware specification of the target platform (console, handheld)
    • Especially, it is good to know what plays a sound, and how to play a sound.
  • Experience of programming language (for instance, C) and assembly
    • This document does not require code analysis, but they will definitely help you to imagine the background of the music driver.
  • Debugger tools
  • Compression algorithms (such as RLE or LZSS)
Necessary tools
  • Hex editor: There are countless editors, use your favorite ones. I usually use Japanese editors such as Stirling, QuickBe and Binary Editor Bz.
  • Emulator: Since each emulators have different features, you may want to have two or more emulators. I suggest you visit TASVideos / Emulator Resources. They usually use emulators that are stable and have some useful tools such as RAM search, debugger and Lua scripting. Additionally, they must be open-source.
  • Memory scanner, (x86) debugger: This is optional. By attaching a debugger to an emulator, you will be able to use some debugging functions, such as RAM tools and read/write breakpoints. I can introduce two famous tools, Memory Hacking Software (MHS) and Cheat Engine. I personally prefer MHS, however, Cheat Engine has a helpful tutorial. I recommend downloading both and trying the tutorial of Cheat Engine.

Generic layout of music sequence

The specification of a music sequence depends on each games, but there is a common layout.

Single track or multiple track

These are equivalent to the difference between SMF format 0 and 1. In the single track format, each channel is played in parallel, by a message that have a channel number.

The most music sequences are multiple track style, but it would be better to remember there are exceptions sometimes.

Mono or poly

Recent music formats can use a chord in a track, but it is not usually possible with the music format for nostalgic game consoles such as Super Nintendo. This is because a single hardware channel is connected with individual tracks. On the contrary, the method to assign a voice to an unused channel dynamically is called DVA (Dynamic Voice Allocation).

In most cases, you does not need to be conscious of these differences during file format analysis. There is not much situations that must be conscious of these differences during file format analysis. However, a DVA-style music engine is more likely to be able to handle a chord in a track. Therefore, the message form could be more suitable for such use.

Message

A message, event, command, or pseudo-opcode... It is called by various names. In this document, I call it a message as like as a MIDI message.

A message is a byte sequence to let a music driver play a sound. Usually, the first byte expresses the message type, and following bytes are its arguments. Most messages are fixed-length messages that always have arguments in the same length according to the message type, but there is a variable-length message that has different length according to a parameter value.

In most cases, timings are recorded in tick counts like a standard MIDI file. However, there are music engines that express a defined length by a constant number. (such as whole note = 0, half-note = 1) *1

The following is a pseudo message dump example.

# pseudo music sequence dump

# Legend:
# ADDRESS: XX (where XX is a hexadecimal number)

# volume = 100
1200: 81 64
# panpot = 64
1202: 82 40
# c d (quarter note)
1204: 3C 18
1206: 3E 18
# e f g rest (eighth note)
1208: 40 0C
120A: 41 0C
120C: 42 0C
120E: 00 0C
# goto address $1204, for infinite loop.
# the byte-order is reversed because the format uses little-endian.
1210: 80 04 12

Messages can be classified roughly, as follows.

  • Note, rest, and tie
  • Control changes (volume change, panpot change, etc.)
  • Branches (such as unconditional/subroutine jump), repeats, and end of track
  • Others, such as game-specific messages (communicate between main CPU and sound unit)

First of all, I usually search notes, then begin researching major control changes, an end of track, branches and repeats.

Header

There must be a region that remembers the start positions of each tracks, at least *2. Also, some formats may have initial value of tempo, echo, etc.

Modern formats are probably portable (platform-independent and relocatable), so their headers should be just before the track data. However, in nostalgic games, such a header can be away from the track data.

Compression

Sequences might be compressed sometimes. Especially, LZSS is widely-used for the compression.

I think the following pages explain LZSS clearly in detail.

You may think you cannot decompress these formats without reading a disassembly. That is not always true. If the game uses a simple and commonly used compression, you are able to guess the details enough.

Static analysis on score data

Static analysis is the most simple and basic approach. Listen to music, imagine how the score data could be, search such a byte sequence, and start the analysis from there.

This method has one great advantage. It does not require any special devices and tools. You can begin the research only with your favorite hex editor and an input file that contains music. Actually, I analyzed SSEQ format of Nintendo DS in such situation.

On the other hand, this method requires researchers to have good imagination skill. The analysis will become more difficult if your target sequence contains a complicated message or it is compressed in unknown algorithm.

This method is handy for startup, however, I strongly recommend using an emulator unless it is not available.

Search a melody

First of all, decide the target song to analyze.

Remember a song that has a short melody, that is monophony and repeats the same beats (but not a long one like a whole note) in the same volume. It will be better if the melody does not have any rests. Keys should change up and down only a little (not greatly).

Okay, let's imagine an example.

# c4 c4 f4 d4
# with weak staccato, that is expressed by a pair of note and rest.

# Example 1 (MIDI-like pattern)
#     key  duration
0000: A8   16
#     ^ note c
0002: 80   02
#     ^ rest
0004: A8   16
#     ^ note c
0006: 80   02
#     ^ rest
0008: AD   16
#     ^ note d
000A: 80   02
#     ^ rest
000C: AA   16
#     ^ note e
000E: 80   02
#     ^ rest

It could be simpler a little.

# c4 c4 f4 d4

# Example 2 (simpler pattern)
#     key  duration
0000: A8   18
0002: A8   18
0004: AD   18
0006: AA   18

Repetition of the same value is redundant. They could be removed.

# c4 c4 f4 d4

# Example 3 (even shorter pattern, omits the repetition of duration)
#     key  duration
0000: A8   18
0002: A8
0003: AD
0004: AA

Anyway, I guess the change of keys should be expressed in byte values.

In the above example, the relative amount of changes of keys is: c c f d → 0 +5 -3 (00 05 FD, in hexadecimal).

You may able to search the melody, by using a tool to apply a differential filter, and a hex editor with a wildcard-search function (XVI32 or Kasat (Japanese)).

Good news! I created a tool called MeloSearch and managed to automate the above boring process. You can find the melody in every examples above, by the following command.

MeloSearch data.seq "ccfd"

The search will fail when the actual data is much different from your expectation. Especially, it is a bit harder to succeed in a single track sequence. However, hopefully you can find the melody and continue your analysis with it.

Have good ears, and improve your imagination.

Disassemble

If you can read disassembly, I recommend you to somewhat look over it and analyze how the music engine work, since there are many of things that are hard to understand from a byte sequence.

This document does not focus on it, because it is relatively harder than other methods and requires more hardware-specific knowledge.

Try if you are interested. It will improve your skill much.

Using emulators

By using an emulator, you can read or manipulate the actual memory values (if your emulator does not have memory viewer and file export function, you need to search the RAM region by a memory scanner such as MHS). It allows you:

  • To extract uncompressed data. Compressed data must be stored to the memory in uncompressed form, while they are used. *3
  • To search song data by searching the pointer to the current playing position and watching its value.
  • To edit a part of sequence and hear what will be changed.
Extracting uncompressed data

This is so easy.

  1. Play the game until the song starts playing
  2. Export the memory image (if you are using an emulator by TASvideos, it usually can be done by clicking "Dump All" button in the "View Memory" window.)
  3. Try to find the data from it
Search pointer to the current playing position

I assume you know the basic of how to search a variable. You can learn basics by TASVideos / Memory Search and/or Cheat Engine's tutorial.

  1. Play the game until the song starts playing
  2. Reset your search
  3. Do "not equal to the previous value" search whenever a track utters a different note
  4. Also, do "equal to the previous value" search while a track does not change anything
  5. Find the pointer and see the location that is pointed by the pointer

Finally, remember the pointer addresses, then add them to your RAM watch.

Edit a part of sequence

You can edit a part of sequence and hear what will be changed. Savestates will help you to repeat to apply changes to the same phrase.

You can also use this method to see the length of a message.

  1. Write a message that you want to see the length
  2. Fill the arguments with "end of track" or "long wait" messages
  3. See the pointer to the current playing position, just after the target message gets dispatched
  4. message length is: pointer_value - message_start_address (unless it is a jump message)

*1:However, even such a music driver, manages timings by tick counts.

*2:Otherwise, each tracks must be always located at the constant address.

*3:However, if your target game does not compress sequence data and the console does not need an own RAM for sound, the game does not copy the sequence data to RAM, because it can read the data from ROM directly. For example, I guess the most of GBA games reads song data directly from ROM.