Deep Dive

The fix for a segfault that never shipped

Updated at:
January 20, 2026

At Recall.ai, we run an unusual workload. We record millions of hours of meetings every month. Each of these meetings generates a large amount of audio and video we need to reliably capture and analyze. The audio we capture comes in a variety of shapes and sizes, an assortment of codecs, channels, sample rates, interleaving, and error-correction schemes. We normalize all of those into one consistent format that is universally playable.

We launch 18 million of EC2 instances every month, each of these instances is a “meeting bot”, which joins a video call and captures the data in real-time. Extremely rarely, about 1 in 36 million bots would abruptly crash deep in library code of our media pipeline. Unlike most web servers, a meeting bot instance is extremely stateful and very hard to replace, a fatal error like this means the data is irrecoverably lost - forever. Even a one-in-tens-of-millions failure rate was unacceptable. This is the story of how we tracked down the rare bug, went to significant effort to reproduce it, identify the root cause and fix it.

TL;DR

We encountered an extremely rare segfault in the AAC encoder we were using and root caused it to a bug in the fixed-point math C code. We found this was patched over a decade ago but the fix was never shipped to downstream consumers. Rather than fixing the bug we replaced the library with a modern AAC encoder which did not experience these crashes.

Capturing the elusive core dump

This crash was so rare that reproducing it locally wasn’t feasible. Instead we opted to capture the program state from production, in the rare event this happens.

Our first clue was the process 139 exit code (139 = 128 + 11). Signal 11 corresponds to SIGSEGV, a segmentation fault or segfault for short, this occurs when a program tries to access memory it is not supposed to. This mistake is bad enough that the execution of the program should halt immediately.

So how do we determine the cause of a SIGSEGV? The answer is often a core dump file. A core dump lets us inspect the state of the program at the instant of the crash, this lets us see many useful things such as the stacktrace, variable/memory contents and more.

Because our bots are run on ephemeral EC2 instances, the entire VM is terminated upon exit. So the core dump files are erased too. In order to diagnose the cause of this segfault we needed to start reliably collecting core dumps across nodes in our cluster. We shipped a minimal rust binary, affectionately named Garbage Truck, which configures the kernel to output core dumps to a well-known location and uploads them to S3.

Garbage Truck

Analyzing the core dump

Getting the stack trace

After I had the core dump in hand, I spun up a new node on our cluster, using the same image where the crash occurred to ensure all binaries and system libraries are where the core dump expects them to be. We use a debian base image, so I installed gdb and loaded up the core dump:

# Install tools
> sudo apt install -y gdb file
# Tell us about the core dmp
> file /path/to/core-dump
core-dump: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 entrypoint.py', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/bin/python3', platform: 'x86_64'
# file tells us that /usr/bin/python3 was the executing binary
# Lets load up the core dump with gdb
>  gdb /usr/bin/python3 /path/to/core-dump
[Current thread is 1 (Thread 0x7f1c3e1fc6c0 (LWP 432))]
# Show me the stacktrace
(gdb) bt
0  0x00007f1caaa58cbb in ?? () from /lib/x86_64-linux-gnu/libvo-aacenc.so.0
1  0x00007f1caaa5bbda in ?? () from /lib/x86_64-linux-gnu/libvo-aacenc.so.0
2  0x00007f1caaa6540b in ?? () from /lib/x86_64-linux-gnu/libvo-aacenc.so.0
3  0x00007f1caaa597ed in ?? () from /lib/x86_64-linux-gnu/libvo-aacenc.so.0
4  0x00007f1caaa58fdb in ?? () from /lib/x86_64-linux-gnu/libvo-aacenc.so.0
5  0x00007f1cab2a4a58 in ?? () from /lib/x86_64-linux-gnu/gstreamer-1.0/libgstvoaacenc.so
6  0x00007f1caa6ca481 in ?? () from /lib/x86_64-linux-gnu/libgstaudio-1.0.so.0
7  0x00007f1caa6cb6ac in ?? () from /lib/x86_64-linux-gnu/libgstaudio-1.0.so.0
8  0x00007f1cadd202a9 in ?? () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
9  0x00007f1cadd227b1 in ?? () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
10 0x00007f1cadd29b6b in gst_pad_push () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
11 0x00007f1caaad5ed4 in ?? () from /lib/x86_64-linux-gnu/gstreamer-1.0/libgstcoreelements.so
12 0x00007f1cadd59441 in ?? () from /lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
13 0x00007f1cae2ec6ca in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
14 0x00007f1cae2ebcfd in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
15 0x00007f1cb0191fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
16 0x00007f1cb021266c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Now we have the stacktrace telling us where our program crashed. We use GStreamer as our media processing framework so it is unsurprising see the crash originating there. The crash occurred in /lib/x86_64-linux-gnu/libvo-aacenc.so.0 which was our AAC audio encoding library. The majority of the stack frames are designated with ?? meaning gdb does not have the symbol information for the associated binaries. Without this, besides staring blindly at the disassembly, there is little we can do to better understand the crash. At this point I suspected the following in the order of probability:

  1. We are subtly misusing some Gstreamer API causing a use-after-free or corrupting the encoder state
  2. The voaacenc gstreamer plugin has a bug causing the crash
  3. libvo-aacenc has a bug causing the crash

Thankfully, our debian base image compiles most libraries with separate debugging symbols and makes them readily available to our gdb:

# Specify the debuginfod url for gdb
> export DEBUGINFOD_URLS="https://debuginfod.debian.net"
# Load up the core dump
> gdb /usr/bin/python3 /path/to/core-dump
# gdb loads all the required debug files...
# Show me the stacktrace
(gdb) bt
0  0x00007f1caaa58cbb in voAACEnc_pow2_xy (x=<optimized out>, y=y@entry=28) at aacenc/basic_op/oper_32b.c:358
1  0x00007f1caaa5bbda in allowMoreHoles (desiredPe=<optimized out>, nChannels=<optimized out>, ahParam=0x7f1c74021082, ahFlag=0x7f1c3e1fb354, peData=0x7f1c3e1fae90, psyOutElement=0x7f1c74022228,
   psyOutChannel=0x7f1c74022488) at aacenc/src/adj_thr.c:676
2  adaptThresholdsToPe (msaParam=0x7f1c74021088, ahParam=0x7f1c74021082, desiredPe=1467, nChannels=1, peData=0x7f1c3e1fae90, logSfbEnergy=<optimized out>, psyOutElement=0x7f1c74022228,
   psyOutChannel=0x7f1c74022488) at aacenc/src/adj_thr.c:812
3  AdjustThresholds (adjThrState=adjThrState@entry=0x7f1c7402105c, AdjThrStateElement=AdjThrStateElement@entry=0x7f1c7402107c, psyOutChannel=psyOutChannel@entry=0x7f1c74022488,
   psyOutElement=psyOutElement@entry=0x7f1c74022228, chBitDistribution=chBitDistribution@entry=0x7f1c3e1fb8d4, logSfbEnergy=logSfbEnergy@entry=0x7f1c74021284, sfbNRelevantLines=0x7f1c74021194,
   qcOE=0x7f1c74022208, elBits=0x7f1c7402104c, nChannels=1, maxBitFac=395) at aacenc/src/adj_thr.c:1187
4  0x00007f1caaa6540b in QCMain (hQC=hQC@entry=0x7f1c7402103c, elBits=elBits@entry=0x7f1c7402104c, adjThrStateElement=adjThrStateElement@entry=0x7f1c7402107c, psyOutChannel=0x7f1c74022488,
   psyOutElement=psyOutElement@entry=0x7f1c74022228, qcOutChannel=0x7f1c74021378, qcOutElement=0x7f1c74022208, nChannels=1, ancillaryDataBytes=0) at aacenc/src/qc_main.c:298
5  0x00007f1caaa597ed in AacEncEncode (aacEnc=aacEnc@entry=0x7f1c74021020, timeSignal=<optimized out>, ancBytes=ancBytes@entry=0x0, numAncBytes=numAncBytes@entry=0x7f1c3e1fb9a6, outBytes=<optimized out>,
   numOutBytes=numOutBytes@entry=0x7f1c3e1fba18) at aacenc/src/aacenc_core.c:180
6  0x00007f1caaa58fdb in voAACEncGetOutputData (hCodec=0x7f1c74021020, pOutput=0x7f1c3e1fba10, pOutInfo=0x7f1c3e1fba30) at aacenc/src/aacenc.c:247
7  0x00007f1cab2a4a58 in gst_voaacenc_handle_frame (benc=0x3547ab0, buf=0x7f1c30017360) at ../ext/voaacenc/gstvoaacenc.c:424
8  0x00007f1caa6ca481 in gst_audio_encoder_push_buffers (enc=enc@entry=0x3547ab0, force=force@entry=0) at ../gst-libs/gst/audio/gstaudioencoder.c:1180
9  0x00007f1caa6cb6ac in gst_audio_encoder_chain (pad=<optimized out>, parent=<optimized out>, buffer=0x7f1c68054d80) at ../gst-libs/gst/audio/gstaudioencoder.c:1410
10 0x00007f1cadd202a9 in gst_pad_chain_data_unchecked (pad=pad@entry=0x338ef50, type=type@entry=4112, data=data@entry=0x7f1c68054d80) at ../gst/gstpad.c:4463
11 0x00007f1cadd227b1 in gst_pad_push_data (pad=pad@entry=0x338ed00, type=type@entry=4112, data=data@entry=0x7f1c68054d80) at ../gst/gstpad.c:4739
12 0x00007f1cadd29b6b in gst_pad_push (pad=0x338ed00, buffer=buffer@entry=0x7f1c68054d80) at ../gst/gstpad.c:4858
13 0x00007f1caaad5ed4 in gst_queue_push_one (queue=0x3544160) at ../plugins/elements/gstqueue.c:1388
14 gst_queue_loop (pad=<optimized out>) at ../plugins/elements/gstqueue.c:1541
15 0x00007f1cadd59441 in gst_task_func (task=0x354acb0) at ../gst/gsttask.c:384
16 0x00007f1cae2ec6ca in g_thread_pool_thread_proxy (data=<optimized out>) at ../../../glib/gthreadpool.c:352
17 0x00007f1cae2ebcfd in g_thread_proxy (data=0x7f1c74017520) at ../../../glib/gthread.c:831
18 0x00007f1cb0191fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
19 0x00007f1cb021266c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

From the above we know that this crash occurs in voAACEnc_pow2_xy in libvo-aacenc. A quick google search does not reveal much either. Only one poor soul, using the same encoder via ffmpeg, potentially ran into this over 12 years ago.

It turns out that we were relying on the VisualOn AAC encoder, which is ancient by modern standards.

At this point it's obvious that switching to a modern AAC encoder would be a good idea, and we did just that.

However, we work with a tremendous amount of highly-optimized media processing code. It's helpful to deeply understand the root cause of a bug like this, as it likely won't be the last. So after shipping the new encoder to production, I continued investigating the root cause of the segfault.

Reproducing the program state

The source code of voAACEnc_pow2_xy is:

/*
  \brief calculates 2 ^ (x/y) for x<=0, y > 0, x <= 32768 * y

  avoids integer division

  \return
*/
Word32 pow2_xy(Word32 x, Word32 y)
{
  UWord32 iPart;
  UWord32 fPart;
  Word32 res;
  Word32 tmp, tmp2;
  Word32 shift, shift2;

  tmp2 = -x;
  iPart = tmp2 / y;
  fPart = tmp2 - iPart*y;
  iPart = min(iPart,INT_BITS-1);

  // crash occurs here
  res = pow2Table[(POW2_TABLE_SIZE*fPart)/y] >> iPart;

  return(res);
}

We are indexing out of bounds of the statically allocated pow2Table. The function itself is an optimised implementation of 2 ^ (x/y) with some invariants. So perhaps we are breaking these invariants (x<=0, y > 0) somehow.

As the next step in my investigation, I looked at the code that called this pow2_xy function. Here is an annotated version:

    // average energy
    avgEn = 0;
    // minimum energy
    minEn = MAX_32;
    // counter
    ahCnt = 0;
    // for each channel
    for (ch=0; ch<nChannels; ch++) {
      PSY_OUT_CHANNEL *psyOutChan = &psyOutChannel[ch];
      for (sfb=startSfb[ch]; sfb<psyOutChan->sfbCnt; sfb++) {

        if ((ahFlag[ch][sfb] != NO_AH) &&
            (psyOutChan->sfbEnergy[sfb] > psyOutChan->sfbThreshold[sfb])) {
          // calculate the running minimum
          minEn = min(minEn, psyOutChan->sfbEnergy[sfb]);
          // sum all energies
          avgEn = L_add(avgEn, psyOutChan->sfbEnergy[sfb]);
          ahCnt++;
        }
      }
    }

    // calculate the average using fixed-point multiplication to "divide" by ahCnt
    if(ahCnt) {
      Word32 iahCnt;
      shift = norm_l(ahCnt);
      iahCnt = Div_32( 1 << shift, ahCnt << shift );
      avgEn = fixmul(avgEn, iahCnt);
    }

    // calculate the logarithmic-scale difference 
    enDiff = iLog4(avgEn) - iLog4(minEn);
    // what ever this means...
    /* calc some energy borders between minEn and avgEn */
    for (enIdx=0; enIdx<4; enIdx++) {
      Word32 enFac;
      enFac = ((6-(enIdx << 1)) * enDiff);
      en[enIdx] = fixmul(avgEn, pow2_xy(L_negate(enFac),7*4));
    }

To try to decipher which invariant (if any) was violated, I inspected the variables and registers through gdb.

#0  0x00007f1caaa58cbb in voAACEnc_pow2_xy (x=<optimized out>, y=y@entry=28) at aacenc/basic_op/oper_32b.c:358
358      res = pow2Table[(POW2_TABLE_SIZE*fPart)/y] >> iPart;
# Print local variables
(gdb) info locals
iPart = <optimized out>
fPart = 4294967290
res = <optimized out>
tmp = <optimized out>
tmp2 = <optimized out>
shift = <optimized out>
shift2 = <optimized out>
# Dump registers
(gdb) info reg
rax            0x9249212           153391634
rbx            0x604               1540
rcx            0x0                 0
rdx            0x7f1caaa6b000      139761098862592
rsi            0x1c                28 # y
rdi            0xfffffffa          4294967290 # fpart
rbp            0xfffffffe          0xfffffffe
rsp            0x7f1c3e1fadc8      0x7f1c3e1fadc8
r8             0x1fff8000          536838144
r9             0x1fe               510
r10            0x200               512
r11            0x1                 1
r12            0x4                 4
r13            0x1fe               510
r14            0x7f1c3e1fae84      139759278075524
r15            0x6c4               1732
rip            0x7f1caaa58cbb      0x7f1caaa58cbb <voAACEnc_pow2_xy+43>
# Go up a frame
(gdb) f 1
#1  0x00007f1caaa5bbda in allowMoreHoles (desiredPe=<optimized out>, nChannels=<optimized out>, ahParam=0x7f1c74021082, ahFlag=0x7f1c3e1fb354, peData=0x7f1c3e1fae90, psyOutElement=0x7f1c74022228,
    psyOutChannel=0x7f1c74022488) at aacenc/src/adj_thr.c:676
676          en[enIdx] = fixmul(avgEn, pow2_xy(L_negate(enFac),7*4));
(gdb) info locals
enFac = <optimized out>
enDiff = <optimized out>
avgEn = <optimized out>
minEn = <optimized out>
ahCnt = <optimized out>
en = {4, 28, 158, 854}
maxSfb = <optimized out>
done = <optimized out>
startSfb = {15, 0}
enIdx = <optimized out>
minSfb = <optimized out>
ch = <optimized out>
sfb = <optimized out>
actPe = 1732
shift = <optimized out>
ch = <optimized out>
sfb = <optimized out>
actPe = <optimized out>
shift = <optimized out>
psyOutChanL = <optimized out>
psyOutChanR = <optimized out>
minEn = <optimized out>
startSfb = <optimized out>
avgEn = <optimized out>
minEn = <optimized out>
ahCnt = <optimized out>
enIdx = <optimized out>
enDiff = <optimized out>
en = <optimized out>
minSfb = <optimized out>
maxSfb = <optimized out>
done = <optimized out>
psyOutChan = <optimized out>
iahCnt = <optimized out>
enFac = <optimized out>
psyOutChan = <optimized out>

Because this is a heavily-optimised release build, the core dump is stubbornly refusing to reveal most of the variables state. From the above we know y is 28 and fPart is an absurdly large 0xfffffffa. This points to some kind of overflow. We don't know x.

To figure out what x might have been, I modified the function and fuzzed over a range of x:

Word32 pow2_xy(Word32 x, Word32 y)
{
  UWord32 iPart;
  UWord32 fPart;
  Word32 res;
  Word32 tmp, tmp2;
  Word32 shift, shift2;

  tmp2 = -x;
  iPart = tmp2 / y;
  fPart = tmp2 - iPart*y;
  iPart = min(iPart,INT_BITS-1);

  if (fPart == 4294967290) {
    printf("**reproduced!**\n");
    return 0;
  }
  if ((POW2_TABLE_SIZE*fPart)/y >= 256) {
    printf("invalid state\n");
    return 0;
  }

  res = pow2Table[(POW2_TABLE_SIZE*fPart)/y] >> iPart;

  return(res);
}

int main(int argc, char *argv[])
{
    for (int i = -10; i < 10; i++) {
        int res = pow2_xy(i, 28);
        printf("pow2_xy(%d, %d) = %d\n", i, 28, res);
    }
}

Prints the following:

pow2_xy(-10, 28) = 1678506817
pow2_xy(-9, 28) = 1719911875
pow2_xy(-8, 28) = 1762338305
pow2_xy(-7, 28) = 1805811302
pow2_xy(-6, 28) = 1855373507
pow2_xy(-5, 28) = 1901141476
pow2_xy(-4, 28) = 1948038440
pow2_xy(-3, 28) = 1996092249
pow2_xy(-2, 28) = 2045331439
pow2_xy(-1, 28) = 2095785251
pow2_xy(0, 28) = 2147483647
invalid state
pow2_xy(1, 28) = 0
invalid state
pow2_xy(2, 28) = 0
invalid state
pow2_xy(3, 28) = 0
invalid state
pow2_xy(4, 28) = 0
invalid state
pow2_xy(5, 28) = 0
**reproduced!**
pow2_xy(6, 28) = 0
invalid state
pow2_xy(7, 28) = 0
invalid state
pow2_xy(8, 28) = 0
invalid state
pow2_xy(9, 28) = 0

So it is probable that we are invoking pow2_xy with x = 6 causing our bot to crash abruptly.

I next looked at a simplified version of the calling code from earlier (with the for loop unrolled), to see if that sheds any light:

    enDiff = iLog4(avgEn) - iLog4(minEn);
    /* calc some energy borders between minEn and avgEn */
    en[0] = fixmul(avgEn, pow2_xy(-(6 * enDiff), 7*4));
    en[1] = fixmul(avgEn, pow2_xy(-(4 * enDiff), 7*4));
    en[2] = fixmul(avgEn, pow2_xy(-(2 * enDiff), 7*4));
    en[3] = fixmul(avgEn, pow2_xy(0, 7*4));

That seems to suggest that if enDiff = -1 it would reproduce our crash exactly. In general if enDiff < 0 for any reason, this would trigger a segfault in pow2_xy. Because enDiff = iLog4(avgEn) - iLog4(minEn);, if enDiff < 0 then iLog4(avgEn) < iLog4(minEn) which implies that avgEn < minEn. If average(energy) < minimum(energy) something else has gone very wrong.

If we could reproduce this locally, then we could step the code and (hopefully) pinpoint exactly where things started to go awry. Reading through the libvo-aacenc source some more, it seems that the encoding functions are pure over the encoder state. Or put simply, they only rely on the current state of the encoder to operate. So what if we dump the encoder state from our core dump resume the execution locally? gdb's dump binary memory is our friend here. Inspecting our encoder structure in gdb:

(gdb) p *(AAC_ENCODER*)hCodec
$58 = {config = {sampleRate = 16000, bitRate = 21333, nChannelsIn = 1, nChannelsOut = 1, bandWidth = 15000, adtsUsed = 0}, elInfo = {elType = ID_SCE, instanceTag = 0, nChannelsInEl = 1, ChannelIndex = {0, 1}},
  qcKernel = {averageBitsTot = 1368, maxBitsTot = 6144, globStatBits = 3, nChannels = 1, bitResTot = 4592, maxBitFac = 395, padding = {paddingRest = 14720}, elementBits = {chBitrate = 21333, averageBits = 1365,
      maxBits = 6144, bitResLevel = 4592, maxBitResBits = 4776, relativeBits = 16384}, adjThr = {bresParamLong = {clipSaveLow = 20, clipSaveHigh = 95, minBitSave = -5, maxBitSave = 30, clipSpendLow = 20,
        clipSpendHigh = 95, minBitSpend = -10, maxBitSpend = 40}, bresParamShort = {clipSaveLow = 20, clipSaveHigh = 75, minBitSave = 0, maxBitSave = 20, clipSpendLow = 20, clipSpendHigh = 75, minBitSpend = -5,
        maxBitSpend = 50}, adjThrStateElem = {peMin = 3690, peMax = 4898, peOffset = 50, ahParam = {modifyMinSnr = 1 '\001', startSfbL = 15, startSfbS = 3}, minSnrAdaptParam = {maxRed = 536870912,
  //// snip ////
              0}, {0, 0, 0, 0, 0, 0, 0, 0}}, windowNrgF = {{0, 0, 0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 0}}, iirStates = {0, 0}, maxWindowNrg = 0, accWindowNrg = 0}, mdctDelayBuffer = 0x7f1c7402b160,
        sfbThresholdnm1 = {41468, 13120, 6576, 2080, 1312, 1312, 1312, 1312, 1312, 1312, 2624, 2624, 2624, 2624, 2624, 2624, 2624, 3936, 3936, 3936, 3936, 5248, 5248, 6560, 6560, 7872, 7872, 9184, 9184, 20960,
          20960, 33184, 33184, 33184, 104960, 104960, 104960, 1048544, 1048544, 1048544, 1048544, 10502400, 10502400, 10502400, 10502400, 10502400, 10502400, 10502400, 31507200, 0, 0}, mdctScalenm1 = 0,
        sfbThreshold = {sfbLong = {0 <repeats 60 times>}, sfbShort = {{0 <repeats 15 times>}, {0 <repeats 15 times>}, {0 <repeats 15 times>}, {0 <repeats 15 times>}, {0 <repeats 15 times>}, {
  //// snip ////
                  0 <repeats 12 times>}, predictionGain = 0}, {tnsActive = 0, parcor = {0 <repeats 12 times>}, predictionGain = 0}, {tnsActive = 0, parcor = {0 <repeats 12 times>}, predictionGain = 0}}}}}},
    pScratchTns = 0x7f1c740284a0, sampleRateIdx = 8}, bseInit = {nChannels = 1, bitrate = 21333, sampleRate = 16000, profile = 1}, bitStream = {pBitBufBase = 0x7f1c2c00c6e0 "\0016\365\2106H#\032\006\3032",
    pBitBufEnd = 0x7f1c2c00cedf "\234\220", pWriteNext = 0x7f1c2c00c6e0 "\0016\365\2106H#\032\006\3032", cache = 0, wBitPos = 0, cntBits = 0, size = 16384, isValid = 1}, hBitStream = 0x7f1c74025378, initOK = 0,
  intbuf = 0x7f1c74025440, encbuf = 0x354b000, inbuf = 0x354b000, enclen = 1024, inlen = 1024, intlen = 0, uselength = 0, hCheck = 0x0, voMemop = 0x3547dc0, voMemoprator = {Alloc = 0x0, Free = 0x0, Set = 0x0,
    Copy = 0x0, Check = 0x0, Compare = 0x0, Move = 0x0}}

The encoder state is a large and complex structure. Oh well, we should still be able to reconstruct it...

# Dump our encoder state variables...
(gdb) dump binary memory quantSpec.bin 0x7f1c7402be00 0x7f1c7402be00+1024
(gdb) dump binary memory maxValueInSfb.bin 0x7f1c7401a660 0x7f1c7401a660+60
(gdb) dump binary memory scf.bin 0x7f1c74017820 0x7f1c74017820+60
(gdb) dump binary memory quantSpec2.bin 0x7f1c7402c600 0x7f1c7402c600+1024
(gdb) dump binary memory maxValueInSfb2.bin 0x7f1c7401a6d8 0x7f1c7401a6d8+60
(gdb) dump binary memory scf2.bin 0x7f1c74017898 0x7f1c74017898+60
(gdb) dump binary memory sfbEnergy.bin 0x7f1c74023970 0x7f1c74023970+100
(gdb) dump binary memory sfbSpreadedEnergy.bin 0x7f1c74023f58 0x7f1c74023f58+100
(gdb) dump binary memory quantSpec.bin 0x7f1c7402be00 0x7f1c7402be00+10240
(gdb) dump binary memory sfbThreshold.bin 0x7f1c740236a0 0x7f1c740236a0+10240
(gdb) dump binary memory maxValueInSfb.bin 0x7f1c7401a660 0x7f1c7401a660+10240
(gdb) dump binary memory scf.bin 0x7f1c74017820 0x7f1c74017820+10240
(gdb) dump binary memory quantSpec2.bin 0x7f1c7402c600 0x7f1c7402c600+10240
(gdb) dump binary memory maxValueInSfb2.bin 0x7f1c7401a6d8 0x7f1c7401a6d8+10240
(gdb) dump binary memory scf2.bin 0x7f1c74017898 0x7f1c74017898+10240
(gdb) dump binary memory sfbEnergy.bin 0x7f1c74023970 0x7f1c74023970+10240
(gdb) dump binary memory sfbSpreadedEnergy.bin 0x7f1c74023f58 0x7f1c74023f58+10240
(gdb) dump binary memory sfbThreshold.bin 0x7f1c740236a0 0x7f1c740236a0+10240
(gdb) dump binary memory mdctSpectrum.bin 0x7f1c74026480 0x7f1c74026480+10240
(gdb) dump binary memory sfbOffset.bin 0x7f1caaa6cbe0 0x7f1caaa6cbe0+10240
(gdb) dump binary memory sfbOffset2.bin 0x7f1caaa6cd38 0x7f1caaa6cd38+10240
(gdb) dump binary memory mdctDelayBuffer.bin 0x7f1c7402a4e0 0x7f1c7402a4e0+10240
(gdb) dump binary memory mdctDelayBuffer2.bin 0x7f1c7402b160 0x7f1c7402b160+10240
(gdb) dump binary memory mdctSpectrum2.bin 0x7f1c74027480 0x7f1c74027480+10240
(gdb) dump binary memory pScratchTns.bin 0x7f1c740284a0 0x7f1c740284a0+10240
(gdb) dump binary memory bitBufBase.bin 0x7f1c2c00c6e0 0x7f1c2c00c6e0+102400
(gdb) dump binary memory hBitStream.bin 0x7f1c74025378 0x7f1c74025378+102400
(gdb) dump binary memory intbuf.bin 0x7f1c74025440 0x7f1c74025440+102400
(gdb) dump binary memory encbuf.bin 0x354b000 0x354b000+10240
# Note down relative offsets...
(gdb) p 0x7f1c2c00cedf - 0x7f1c2c00c6e0 # bitBufEnd
$12 = 2047
(gdb) p 0x7f1c2c00c6e0 - 0x7f1c2c00c6e0 # pWriteNext
$13 = 0

Now we can hack together some C to restore our encoder state from the memory dumps:

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <aacenc/inc/aacenc_core.h>
#include <aacenc/basic_op/oper_32b.h>
#include <aacenc/basic_op/oper_32b.c>
#include <common/include/voAAC.h>
#include <common/include/cmnMemory.h>
#include "wavreader.h"

// loads a file into an malloc'd region
void *read_to_mem(char *filename)
{
    struct stat st;
    if (stat(filename, &st) != 0)
    {
        printf("stat failed: %s", filename);
        exit(1);
    }

    void *buf = (void *)malloc(st.st_size);
    FILE *file = fopen(filename, "r");

    if (fread((unsigned char *)buf, 1, st.st_size, file) <= 0)
    {
        printf("read failed: %s", filename);
        exit(1);
    }
    fclose(file);

    printf("loaded %s into buffer size %lld\n", filename, (long long)st.st_size);

    return buf;
}

int main(int argc, char *argv[])
{
    VO_AUDIO_CODECAPI codec_api = {0};
    // load the root encoder state
    AAC_ENCODER *handle = (AAC_ENCODER *)read_to_mem("memdump/encoder-state.bin");
    // update referenced pointers
    handle->qcOut.qcChannel[0].quantSpec = (Word16 *)read_to_mem("memdump/quantSpec.bin");
    handle->qcOut.qcChannel[0].maxValueInSfb = (Word16 *)read_to_mem("memdump/maxValueInSfb.bin");
    handle->qcOut.qcChannel[0].scf = (Word16 *)read_to_mem("memdump/scf.bin");
    handle->qcOut.qcChannel[1].quantSpec = (Word16 *)read_to_mem("memdump/quantSpec2.bin");
    handle->qcOut.qcChannel[1].maxValueInSfb = (Word16 *)read_to_mem("memdump/maxValueInSfb2.bin");
    handle->qcOut.qcChannel[1].scf = (Word16 *)read_to_mem("memdump/scf2.bin");
    handle->psyOut.psyOutChannel[0].sfbEnergy = (Word32 *)read_to_mem("memdump/sfbEnergy.bin");
    handle->psyOut.psyOutChannel[0].sfbSpreadedEnergy = (Word32 *)read_to_mem("memdump/sfbSpreadedEnergy.bin");
    handle->psyOut.psyOutChannel[0].sfbThreshold = (Word32 *)read_to_mem("memdump/sfbThreshold.bin");
    handle->psyOut.psyOutChannel[0].mdctSpectrum = (Word32 *)read_to_mem("memdump/mdctSpectrum.bin");
    handle->psyKernel.psyConfLong.sfbOffset = (Word16 *)read_to_mem("memdump/sfbOffset.bin");
    handle->psyKernel.psyConfShort.sfbOffset = (Word16 *)read_to_mem("memdump/sfbOffset2.bin");
    handle->psyKernel.psyData[0].mdctDelayBuffer = (Word16 *)read_to_mem("memdump/mdctDelayBuffer.bin");
    handle->psyKernel.psyData[0].mdctSpectrum = (Word32 *)read_to_mem("memdump/mdctSpectrum.bin");
    handle->psyKernel.psyData[1].mdctDelayBuffer = (Word16 *)read_to_mem("memdump/mdctDelayBuffer2.bin");
    handle->psyKernel.psyData[1].mdctSpectrum = (Word32 *)read_to_mem("memdump/mdctSpectrum2.bin");
    handle->psyKernel.pScratchTns = (Word32 *)read_to_mem("memdump/pScratchTns.bin");
    handle->bitStream.pBitBufBase = (UWord8 *)read_to_mem("memdump/bitBufBase.bin");
    handle->hBitStream = (HANDLE_BIT_BUF)read_to_mem("memdump/hBitStream.bin");
    handle->hBitStream->pBitBufBase = (UWord8 *)read_to_mem("memdump/bitBufBase.bin");
    handle->intbuf = (short *)read_to_mem("memdump/intbuf.bin");
    handle->encbuf = (short *)read_to_mem("memdump/encbuf.bin");
    // update relative pointers
    handle->bitStream.pBitBufEnd = handle->bitStream.pBitBufBase + 2047;
    handle->bitStream.pWriteNext = handle->bitStream.pBitBufBase;
    handle->hBitStream->pBitBufEnd = handle->hBitStream->pBitBufBase + 2047;
    handle->hBitStream->pWriteNext = handle->hBitStream->pBitBufBase;
    handle->inbuf = handle->encbuf;

    // update function pointers
    handle->voMemop = (VO_MEM_OPERATOR *)malloc(sizeof(VO_MEM_OPERATOR));
    handle->voMemop->Alloc = cmnMemAlloc;
    handle->voMemop->Copy = cmnMemCopy;
    handle->voMemop->Free = cmnMemFree;
    handle->voMemop->Set = cmnMemSet;
    handle->voMemop->Check = cmnMemCheck;

    VO_CODECBUFFER *output = (VO_CODECBUFFER *)read_to_mem("memdump/output.bin");
    output->Buffer = (VO_PBYTE)read_to_mem("memdump/output-buf.bin");
    VO_AUDIO_OUTPUTINFO output_info = {0};

    voGetAACEncAPI(&codec_api);

    // trigger the encoder!
    unsigned int res = codec_api.GetOutputData((VO_HANDLE)handle, output, &output_info);
    if (res != VO_ERR_NONE)
    {
        fprintf(stderr, "Unable to encode frame: %u\n", res);
        return 1;
    }

    codec_api.Uninit(handle);

    fprintf(stderr, "Done\n");
    return 0;
}

Now we should be able to execute the encoder and reproduce the crash!

> make && ./aac-enc 
# snip...
loaded memdump/encoder-state.bin into buffer size 17408
loaded memdump/quantSpec.bin into buffer size 10240
loaded memdump/maxValueInSfb.bin into buffer size 10240
loaded memdump/scf.bin into buffer size 10240
loaded memdump/quantSpec2.bin into buffer size 10240
loaded memdump/maxValueInSfb2.bin into buffer size 10240
loaded memdump/scf2.bin into buffer size 10240
loaded memdump/sfbEnergy.bin into buffer size 10240
loaded memdump/sfbSpreadedEnergy.bin into buffer size 10240
loaded memdump/sfbThreshold.bin into buffer size 10240
loaded memdump/mdctSpectrum.bin into buffer size 10240
loaded memdump/sfbOffset.bin into buffer size 10240
loaded memdump/sfbOffset2.bin into buffer size 10240
loaded memdump/mdctDelayBuffer.bin into buffer size 10240
loaded memdump/mdctSpectrum.bin into buffer size 10240
loaded memdump/mdctDelayBuffer2.bin into buffer size 10240
loaded memdump/mdctSpectrum2.bin into buffer size 10240
loaded memdump/pScratchTns.bin into buffer size 10240
loaded memdump/bitBufBase.bin into buffer size 102400
loaded memdump/hBitStream.bin into buffer size 102400
loaded memdump/bitBufBase.bin into buffer size 102400
loaded memdump/intbuf.bin into buffer size 102400
loaded memdump/encbuf.bin into buffer size 10240
loaded memdump/output.bin into buffer size 8
loaded memdump/output-buf.bin into buffer size 2048
Done

Done!? Wait, where is our burning pile of invalid memory dereferences? It seems that the resumed encoder exited cleanly.

Reconstructing the caller state

What if we add some logging to pow2_xy?

# snip
pow2_xy(-104, 320)
pow2_xy(-141, 320)
pow2_xy(-58, 320)
pow2_xy(-64, 320)
pow2_xy(-40, 320)
pow2_xy(-68, 320)
pow2_xy(-136, 320)
pow2_xy(-40, 320)
pow2_xy(-40, 320)
pow2_xy(-40, 320)
pow2_xy(-64, 320)
pow2_xy(-40, 320)
pow2_xy(-40, 320)
pow2_xy(-90, 320)
pow2_xy(-94, 320)
pow2_xy(-130, 320)
pow2_xy(-82, 320)
pow2_xy(-104, 320)
pow2_xy(-126, 320)
pow2_xy(-86, 320)
pow2_xy(-58, 320)
pow2_xy(-166, 320)
pow2_xy(-112, 320)
pow2_xy(-76, 320)
pow2_xy(-108, 320)
pow2_xy(-10769, 1380)
pow2_xy(-10504, 1380)
pow2_xy(-66, 28)
pow2_xy(-44, 28)
pow2_xy(-22, 28)
pow2_xy(0, 28)
Done

There are no invalid x values here either. Stepping through the code using gdb makes it clear as to why. Many of the internal functions mutate the encoder state throughout the lifetime of the GetOutputData entrypoint. So the state of the the encoder that we restored was the state at the instant of the segfault and not the actual state when the bot called GetOutputData prior to the crash. Unfortunately we cannot recover the original encoder state from the core dump either.

So let us go deeper then. How about if we restore the state at the frame above pow2_xy? Lets dump the memory from those arguments and try again:

# Go up a frame to "allowMoreHoles"
(gdb) f 1
#1  0x00007f1caaa5bbda in allowMoreHoles (desiredPe=<optimized out>, nChannels=<optimized out>, ahParam=0x7f1c74021082, ahFlag=0x7f1c3e1fb354, peData=0x7f1c3e1fae90, psyOutElement=0x7f1c74022228,
    psyOutChannel=0x7f1c74022488) at aacenc/src/adj_thr.c:676
676          en[enIdx] = fixmul(avgEn, pow2_xy(L_negate(enFac),7*4));
# Show the arguments
(gdb) info args
desiredPe = <optimized out>
nChannels = <optimized out>
ahParam = 0x7f1c74021082
ahFlag = 0x7f1c3e1fb354
peData = 0x7f1c3e1fae90
psyOutElement = 0x7f1c74022228
psyOutChannel = 0x7f1c74022488
# Dump the argument state
(gdb) dump binary memory psyOutChannel.bin 0x7f1c74022488 0x7f1c74022488+sizeof(PSY_OUT_CHANNEL)*2
(gdb) dump binary memory sfbEnergy.bin 0x7f1c74023970 0x7f1c74023970+100
(gdb) dump binary memory sfbSpreadedEnergy.bin 0x7f1c74023f58 0x7f1c74023f58+100
(gdb) dump binary memory sfbThreshold.bin 0x7f1c740236a0 0x7f1c740236a0+10240
(gdb) dump binary memory mdctSpectrum.bin 0x7f1c74026480 0x7f1c74026480+10240
(gdb) dump binary memory peData.bin 0x7f1c3e1fae90 0x7f1c3e1fae90+sizeof(PE_DATA)
(gdb) dump binary memory ahFlag.bin 0x7f1c3e1fb354 0x7f1c3e1fb354+1024

Now revise our main:

int main(int argc, char *argv[])
{
    PSY_OUT_CHANNEL* psyOutChannel = (PSY_OUT_CHANNEL *)read_to_mem("memdump/psyOutChannel.bin");
    psyOutChannel[0].sfbEnergy = (Word32 *)read_to_mem("memdump/sfbEnergy.bin");
    psyOutChannel[0].sfbSpreadedEnergy = (Word32 *)read_to_mem("memdump/sfbSpreadedEnergy.bin");
    psyOutChannel[0].sfbThreshold = (Word32 *)read_to_mem("memdump/sfbThreshold.bin");
    psyOutChannel[0].mdctSpectrum = (Word32 *)read_to_mem("memdump/mdctSpectrum.bin");
    PE_DATA* peData = (PE_DATA*)read_to_mem("memdump/peData.bin");
    Word16* ahFlag = (Word16*)read_to_mem("memdump/ahFlag.bin");
    Word16* ahParam = (Word16*)read_to_mem("memdump/ahParam.bin");

    allowMoreHoles(psyOutChannel, NULL, peData, ahFlag, ahParam, 1, 1540);
}

Finally, we can run the program:

> make && ./aac-enc
# snip...
loaded memdump/psyOutChannel.bin into buffer size 3040
loaded memdump/sfbEnergy.bin into buffer size 10240
loaded memdump/sfbSpreadedEnergy.bin into buffer size 10240
loaded memdump/sfbThreshold.bin into buffer size 10240
loaded memdump/mdctSpectrum.bin into buffer size 10240
loaded memdump/peData.bin into buffer size 2420
loaded memdump/ahFlag.bin into buffer size 1024
loaded memdump/ahParam.bin into buffer size 6
pow2_xy(6, 28)
Segmentation fault

Success! It seems we can now reproduce the crash, even if it is directly from the above stack frame.

Identifying the root cause

Now we can modify our compiler flags with -O0 to disable optimisations and inspect the local state during execution.

> gdb ./.libs/aac-enc.
Reading symbols from ./.libs/aac-enc...
(gdb) run
Starting program: /storage/workspace/bots/debug/vo-aacenc-0.1.3/.libs/aac-enc 
# snip...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f942e6 in voAACEnc_pow2_xy (x=6, y=28) at aacenc/basic_op/oper_32b.c:370
370       res = pow2Table[(POW2_TABLE_SIZE*fPart)/y] >> iPart;
(gdb) bt
#0  0x00007ffff7f942e6 in voAACEnc_pow2_xy (x=6, y=28) at aacenc/basic_op/oper_32b.c:370
#1  0x00007ffff7f97ea0 in allowMoreHoles (psyOutChannel=0x55555555a2a0, psyOutElement=0x0, peData=0x5555555654c0, ahFlag=0x5555555702b0, ahParam=0x555555566250, 
    nChannels=1, desiredPe=1540) at aacenc/src/adj_thr.c:714
#2  0x0000555555556402 in main (argc=1, argv=0x7fffffffe058) at aac-enc.c:70
(gdb) f 1
#1  0x00007ffff7f97ea0 in allowMoreHoles (psyOutChannel=0x55555555a2a0, psyOutElement=0x0, peData=0x5555555654c0, ahFlag=0x5555555702b0, ahParam=0x555555566250, 
    nChannels=1, desiredPe=1540) at aacenc/src/adj_thr.c:714
714         en[0] = fixmul(avgEn, pow2_xy(-(6 * enDiff), 7*4));
(gdb) info locals
enDiff = -1
avgEn = 510 # !!
minEn = 512 # !!
ahCnt = 1
en = {507315022, 0, 1679226483, 0}
maxSfb = 0
done = 0
startSfb = {15, 0}
enIdx = 0
minSfb = 1000
ch = 1
sfb = 43
actPe = 1732
shift = 30
(gdb) 

We have just confirmed our previous conjecture that avgEn < minEn! So why does this happen? Lets step through the relevant code.

# Set a breakpoint
(gdb) b allowMoreHoles
# Start the program
(gdb) run
Starting program: /storage/workspace/bots/debug/vo-aacenc-0.1.3/.libs/aac-enc 
Breakpoint 1, allowMoreHoles (psyOutChannel=0x55555555a2a0, psyOutElement=0x0, peData=0x5555555654c0, ahFlag=0x5555555702b0, ahParam=0x555555566250, nChannels=1, desiredPe=1540) at aacenc/src/adj_thr.c:699
699         if(ahCnt) {
# Print out minEn and avgEn after initial calculation
(gdb) p minEn 
$1 = 512
(gdb) p avgEn
$2 = 512
# They look good at this stage, keep going...
(gdb) n
701           shift = norm_l(ahCnt);
(gdb) n
702               iahCnt = Div_32( 1 << shift, ahCnt << shift );
(gdb) n
703           avgEn = fixmul(avgEn, iahCnt);
# We just updated avgEn, lets look again
(gdb) p avgEn
$8 = 510
# Oh no, so this `fixmul` nudged `avgEn` below `minEn` 
(gdb) n
707         enDiff = iLog4(avgEn) - iLog4(minEn);
# And that means enDiff is now negative...
(gdb) p enDiff
$10 = -1
# And all hell breaks loose...
(gdb) n
714         en[0] = fixmul(avgEn, pow2_xy(-(6 * enDiff), 7*4));
(gdb) n
pow2_xy(6, 28)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f942e6 in voAACEnc_pow2_xy (x=6, y=28) at aacenc/basic_op/oper_32b.c:370
370       res = pow2Table[(POW2_TABLE_SIZE*fPart)/y] >> iPart;

So now we know at least roughly where things have gone wrong. The fixed-point calculation of the average energy avgEn is introducing an error causing avgEn < minEn in rare cases.

Understanding the fixed-point math

But what is actually happening in norm_l, Div_32 and fixmul that leads to this error? The source code mentions these operations are in "double precision" however most of the values are stored in 32-bit integers. Despite the name, this “double precision” format is a custom fixed-point representation, not IEEE floating point.

What is going on here? Google leads us to this relic, last modified in 1996 by the France Telecom:

/*___________________________________________________________________________
 |                                                                           |
 |  This file contains operations in double precision.                       |
 |  These operations are not standard double precision operations.           |
 |  They are used where single precision is not enough but the full 32 bits  |
 |  precision is not necessary. For example, the function Div_32() has a     |
 |  24 bits precision which is enough for our purposes.                      |
 |                                                                           |
 |  The double precision numbers use a special representation:               |
 |                                                                           |
 |     L_32 = hi<<16 + lo<<1                                                 |
 |                                                                           |
 |  L_32 is a 32 bit integer.                                                |
 |  hi and lo are 16 bit signed integers.                                    |
 |  As the low part also contains the sign, this allows fast multiplication. |
 |                                                                           |
 |      0x8000 0000 <= L_32 <= 0x7fff fffe.                                  |
 |                                                                           |
 |  We will use DPF (Double Precision Format )in this file to specify        |
 |  this special format.                                                     |
 |___________________________________________________________________________|
*/

Ahhh, so is a special kind of "double precision" split across 32 bits. Now with this in mind we can annotate the relevant code:

if(ahCnt) {
  // calc how many bits needed to shift ahCnt into the normalised range [0x40000000, 0x7fff0000]
  shift = norm_l(ahCnt);
  // perform fixed-width division of normalised 1 and ahCnt
  iahCnt = Div_32(1 << shift, ahCnt << shift); 
  // perform fixed-width multiplication of avgEn and iahCnt 
  avgEn = fixmul(avgEn, iahCnt);
}

So where are we losing precision then? Lets step through our scenario in gdb:

# Break at `main`
(gdb) b main
Breakpoint 1 at 0x22f0: file aac-enc.c, line 60.
# Start the program
(gdb) run
Starting program: /storage/workspace/bots/debug/vo-aacenc-0.1.3/.libs/aac-enc 
Breakpoint 1, main (argc=1, argv=0x7fffffffe068) at aac-enc.c:60
# Set up the input
(gdb) set $avgEn = 512
# Calculate the normalisation left shift for ahCnt = 1
(gdb) set $shift = norm_l(1)
(gdb) p $shift
$1 = 30
# Perform the shifts
(gdb) set $num = 1 << $shift
(gdb) set $denom = 1 << $shift
# Perform the division
(gdb) set $iahCnt = voAACEnc_Div_32($num, $denom)
# Perform the multiplication
(gdb) set $mul = (uint64_t)$avgEn * (uint64_t)$iahCnt
(gdb) set $avgEn = ($mul >> 32) << 1
# Observe the results
(gdb) p (double)$iahCnt / (double)((uint64_t)2 << 31)
$3 = 0.4999999962747097
(gdb) p (double)$mul / (double)((uint64_t)2 << 31)
$3 = 254.99999810010195
(gdb) p $avgEn
$5 = 510

There are a couple of things that do not look right. Firstly, we are executing Div_32($num, $denom) where $num = $demon breaking the num < denom invariant. We then lose some precision in our Div_32 call and this propagates to our 64-bit multiplication of $avgEn, at this stage the floating point equivalent is ~255 and our error is -1. The final left-shift then doubles the error to -2 and we are left with 510.

What if Div_32 did not lose precision?

# Break at `main`
(gdb) b main
Breakpoint 1 at 0x22f0: file aac-enc.c, line 60.
# Start the program
(gdb) run
Starting program: /storage/workspace/bots/debug/vo-aacenc-0.1.3/.libs/aac-enc 
Breakpoint 1, main (argc=1, argv=0x7fffffffe068) at aac-enc.c:60
# Set up the input
(gdb) set $avgEn = 512
# 2 << 30 is exactly floating point 0.5 in Q31
(gdb) set $iahCnt = (uint64_t)2 << 30
# Perform the multiplication
(gdb) set $mul = (uint64_t)$avgEn * (uint64_t)$iahCnt
(gdb) set $avgEn = ($mul >> 32) << 1
# Observe the results
(gdb) p (double)$iahCnt / (double)((uint64_t)2 << 31)
$3 = 0.5
(gdb) p (double)$mul / (double)((uint64_t)2 << 31)
$3 = 256
(gdb) p $avgEn
$5 = 512

Voila! avgEn is now exactly what we expect it to be after dividing it by 1.

Implementing a fix

So how do we fix this? Could we modify Div_32 to never lose precision? Probably not without harming its performance characteristics.

In this particular case the local invariant is minEn < avgEn. So lets just add implement a lower-bound for avgEn:

if(ahCnt) {
  Word32 iahCnt;
  shift = norm_l(ahCnt);
  iahCnt = Div_32( 1 << shift, ahCnt << shift );
  avgEn = fixmul(avgEn, iahCnt);
  if (avgEn < minEn) {
    avgEn = minEn;
  }
}

Now running the program again:

> make && ./aac-enc 
# snip...
loaded memdump/ahFlag.bin into buffer size 1024
pow2_xy(0, 28)
pow2_xy(0, 28)
pow2_xy(0, 28)
pow2_xy(0, 28)
> echo $?
0

That seems to have worked! At this point I was struggling to believe that nobody else had encountered this issue previously. I scoured Google, GitHub and finally Bing (of all places) to find this mailing list thread. A quick read reveals that it is almost certainly referring to the same bug. And that it was fixed upstream nearly 12 years ago! However, the fix was never released to the gstreamer plugin, and this bug is still present today.

Conclusion

Although it was satisfying to deeply understand and fix the bug that had been causing these sporadic crashes, this encoder is clearly a liability and we should consider modern alternatives.

Building on the libav AAC encoder used in ffmpeg and available as a gstreamer plugin sounds extremely boring, so it is the obvious choice.