jellybee system internals

Abstractions, Aspects, and PIC Modularity: The Compiler Subversion 1

Compiler Internals

Swarm Factory

Kerberos build path, AOP, COFF, linker, and pinit

The goal of Swarm Factory in the system is to dissect projects and tools, compile selected functionality into PIC, and add capabilities and evasion layers. The aspect-oriented idea came later: during transformation, a capability can be isolated at a specific phase and composed back as a cross-cutting concern. That changed my initial assumptions about section handling.

Aspect-oriented programming is a paradigm for adding functionality to a base program when that functionality would normally require code spread across many places, but doing it from one controlled point. The classic example is event logging: the base program may have no integrated logging, while AOP provides a way to apply that concern uniformly across multiple points of interest.

This AOP framing was influenced by the Tradecraft Garden ecosystem developed by Raphael Mudge, the creator of Cobalt Strike and a Metasploit contributor. I will leave the broader analysis for later posts. First, we will examine our "compiler" in depth, what it returns, and how its telemetry helps us branch into different technologies.

Build JSON Artifacts

They act as contracts between the preparator, the transpilador, the linker, the graphical view, and the debugging layer. The current version consumes more structured outputs and reduces heuristics: preparator detections, toolchain paths, the symbol-to-import-library index, AST metadata, the rewrite plan, the argument matrix, validation reports, and the final manifest.

  • PROJECT_ROOT/
    • pill/build_manifest.json (wrapper manifest: effective source, tools, phases, hashes, and final bin)
    • preparator/main_json/
      • report.json / report_advanced.json (source detections, per-phase artifacts, and API-DLL hints)
      • toolchain_scan.json (paths, sysroot, available libraries, and candidates)
      • importlib_symbol_index.json (symbol-to-import-library mapping)
    • pill/
      • correlation_phase1.json (enriched report with correlated tags, detections, and targets)
      • ast_scan.json / rewrite_plan.json (lightweight AST, call graph, literals, hotspots, and planned decisions)
      • ast_validation.json / ast_schema.json / ast_phase_diff.json (validation and source -> phase1..phase9 diff)
      • ast_rewrite_apply.json / ast_size_report.json (wrapper-applied rewrites and final PIC section dependencies)
      • pillArgsInstMatrix.json (GIMPLE matrix used to rebuild argc/argv)
      • desglobalization.json (global state removed, moved, or rejected)
      • *_orphanApiValidation.json (final orphan-symbol classification)
      • *_output_Matrix_rebuild.json (literal/log tags rebuilt outside the blob)
      • debug/
        • swarm_symbols_phase9.json (final blob offsets and symbols)

Native C to PIC Transformation, from Preprocessor to COFF

High-level relation between temporary PE, PIC blob, and PILL SWARM contract
Diagram 1. Build, compilation, and transformation pipeline.

The kdRajKit/JellyBeeSystem-BlogLabs repository includes three test PIC payloads and a P2P loader for debugging the flow outside the article.

Starting Source Code

main.c
#include <windows.h>

int main( void )
{
    MessageBoxA( NULL, "Jellybe", "PIC", MB_OK );
    return 0;
}

Core Emitted After Rajpilator

The following block shows the core of the final output: the pinit() contract, hash-based API resolution, and literals materialized on the stack. The real file may include Win32 headers and auxiliary runtime blocks around it.

main_pill_phase9.c
#include "runtime/api_hashes.h"
#include "runtime/pill.h"

typedef struct
{
    wchar_t* Buffer;
    size_t   Capacity;
    size_t   Length;
} JbSwarmPill;

void pinit( JbSwarmPill* Pill );
void pinit( JbSwarmPill* Pill )
{
    ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() );

    unsigned char JbLitL11Raw[9];
    char* JbLitL11 = ( char* ) JbLitL11Raw;
    {
        volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL11Raw;
        Ptr[0] = 0x4A;
        Ptr[1] = 0x65;
        Ptr[2] = 0x6C;
        Ptr[3] = 0x6C;
        Ptr[4] = 0x79;
        Ptr[5] = 0x42;
        Ptr[6] = 0x65;
        Ptr[7] = 0x65;
        Ptr[8] = 0x00;
    }

    unsigned char JbLitL24Raw[4];
    char* JbLitL24 = ( char* ) JbLitL24Raw;
    {
        volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL24Raw;
        Ptr[0] = 0x50;
        Ptr[1] = 0x49;
        Ptr[2] = 0x43;
        Ptr[3] = 0x00;
    }

    JbApiMessageBoxA( ( ( void * ) 0 ), JbLitL11, JbLitL24, __MSABI_LONG( 0x00000000 ) );
    return;
}

ABI Entry, Layout, and pinit Contract

PROJECT_ROOT/pill contains abi.asm and pill.ld. The canonical generator lives in transpilador/app/transpilador.c; those files are materialized inside pill/ for each build.

Section Order

The linker script sets abi as the entrypoint, places .text.abi first, then .text.pinit, and then the remaining .text*. That order keeps the entry stub at the beginning of the blob and places pinit() immediately behind the ABI contract before the rest of the emitted code.

.data is preserved because it is not PE metadata or a loader convenience: it can hold initialized state that the code itself needs at runtime. If the compiler or the minimal runtime emits initialized globals, pointers, small tables, initialized buffers, or any writable object referenced from .text, discarding .data would break those references. The usual result would be a missing-memory access, a garbage read, or a write into an address that no longer matches the real blob layout.

pill.ld
ENTRY(abi)
SECTIONS
{
    . = __image_base__ + 0x1000;
    .text :
    {
        *(.text.abi)
        *(.text.pinit)
        *(.text*)
    }
    .data :
    {
        *(.data*)
    }
    /DISCARD/ : { *(.idata*) }
}

Generated ABI and Stack Alignment

The generated sequence appears once per build. The reading below breaks the stub down into concrete guarantees.

abi.asm
extern pinit
global abi

segment .text

abi:
    push rdi                    ; save rdi because it is the working base register
    mov rdi, rsp                ; preserve the original stack pointer in rdi
    and rsp, byte -0x10         ; align the stack to 16 bytes
    sub rsp, 0x20               ; reserve x64 ABI shadow space
    ; RCX arrives from the caller and holds JbSwarmPill* when present
    ; do not touch RCX, so --ctx-args can work at runtime

    call pinit                  ; call the C function
    ; EAX keeps the real pinit return value
    mov rsp, rdi                ; restore the original stack pointer
    pop rdi                     ; restore rdi
    ret                         ; return to the caller

Point-by-Point Reading

  • RCX is not touched before call pinit; therefore it can carry JbSwarmPill*.
  • RDX, R8, and R9 are not part of the base input contract; arguments and output travel inside Pill.
  • RSP is aligned to 16 bytes and Win64 shadow space is reserved before entering C.
  • RDI preserves the original stack value and is restored before returning.
  • EAX keeps the real pinit() return value; the stub does not force it to zero.
  • Any stack pivot, TEB manipulation, or alternative prologue must be introduced as its own ABI before assembly.
Register and stack preparation in abi.asm
Diagram 2. Register and stack contract in abi.asm.

RCX Contract Toward pinit

After reading the stub, the important point is clear: abi aligns RSP and reserves shadow space, but it does not overwrite RCX. The pointer prepared by the loader can therefore cross intact into the C call.

pinit does not receive loose arguments through RDX, R8, or R9 in the base contract. Everything crosses through RCX as JbSwarmPill*: the output buffer, its capacity, the output counter, and, when applicable, the context argument vector.

pinit contract and JbSwarmPill passed through RCX
Diagram 3. Entry through RCX into pinit().

Minimal pinit Contract

  • pinit(JbSwarmPill* Pill) is the stable C entrypoint that receives the pointer transported through RCX.
  • The base output contract lives in Pill: Buffer, Capacity, and Length.
  • JbtoBindOut( Pill ) initializes that output and makes it the active buffer of the minimal runtime.
  • Optional arguments live in the Pill context, normally as CtxArgc and CtxArgv, and are consumed only if the loader marks them valid.

Args Hook and argv Instrumentation

The argument path is separate from the base contract. phase5A builds the minimal argc/argv environment only when the source needs it, reading the context from Pill and not from additional ABI registers. The transpilador accepts both flag names: --ctx-args and --args-ctx.

  • In materialized-argument mode, param_N and argvLocal are generated inside the function.
  • In context-argument mode, Pill->CtxArgc and Pill->CtxArgv are read only if the loader provides a valid vector.
  • The context vector is accepted when Pill exists, CtxArgv exists, and CtxArgc is between 1 and 64.
  • If the context is not valid, a minimal local fallback is kept for argc and argv.
  • If the source uses Argc or Argv, local aliases are created from argc and argv.
argc and argv captured in pinit with ctx args
argc/argv rebuilt in pinit().
Execution with ctx args from the loader and output visible in WinDbg
--ctx-args: arguments after the .bin, output in console/WinDbg.

Control Structures and Unwanted Sections

Dense switch statements can be lowered into jump tables. In that case, the code no longer decides only with local comparisons: it computes an index, reads an entry from a table in .rdata, and jumps into a .text block. The typical object-level pattern is:

objdump.txt
movslq (%rsi,%rdx,4), %rdx
add    %rsi, %rdx
jmp    *%rdx

The dependency proof appears in the relocations: .rdata contains relative entries into .text. If that section is removed, the indirect jump reads a missing or displaced table.

RELOCATION RECORDS FOR [.rdata]:
OFFSET           TYPE                  VALUE
0000000000000018 IMAGE_REL_AMD64_REL32 .text
000000000000001c IMAGE_REL_AMD64_REL32 .text
...
000000000000007c IMAGE_REL_AMD64_REL32 .text

In the current PIC flow, raj_flow_pic.py compiles phases with -fno-jump-tables to reduce this pattern. For especially dense sources, -fno-tree-switch-conversion can be added as extra frontend/GIMPLE hardening.

-fno-jump-tables
-fno-tree-switch-conversion   # optional when tree/GIMPLE conversions must be blocked

Even then, the final criterion is still COFF verification: the movslq/add/jmp *reg pattern and jump-table relocations from .rdata into .text must not remain if the final binary is intended to be .text-only.

CRT in the Flow

The CRT is the layer that turns a freshly loaded process into a stable C environment before main runs. Its final effect is not merely "startup"; it establishes runtime invariants so application code behaves predictably.

At startup, the CRT initializes global C-library state, prepares execution contracts (argc/argv/envp), configures runtime pieces used by libc functions, and chains global initializers/constructors. At exit, it coordinates orderly termination (atexit, destructors, and runtime-managed resource cleanup).

Replacing it with a minimal CRT reduces surface area and runtime dependency, but it also removes that guarantee framework: anything that depends on implicit initialization is no longer covered. In practice, the binary gains fine-grained control over the flow, but it must explicitly and locally materialize every initialization/termination step previously handled by the full CRT.

Minimal CRT Limitations in PIC Mode

  • There is no full CRT initialization for global objects/constructors or standard termination paths (atexit and destructors).
  • There is no complete argument/environment layer with classic runtime behavior (argc/argv/envp normalization and associated helpers).
  • The full stdio/locale/printf library is not present; only the subset included in mincrt.h remains.
  • Code that depends on internal CRT symbols, such as __main or runtime initializers, is no longer portable.

Data, Literals, and Imports When Converging to PIC/PILL

This block materializes the literal on the stack. JbLitL11Raw reserves 8 local bytes inside pinit(), Ptr writes the text byte by byte, and JbLitL11 becomes a C pointer to the \0-terminated buffer. The blob therefore does not need a static .rdata reference for that string: the data is born at runtime inside the active frame and consumed while pinit() is still executing.

main_pill_phase9.c
unsigned char JbLitL11Raw[8];
char* JbLitL11 = ( char* ) JbLitL11Raw;
{
    volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL11Raw;
    Ptr[0] = 0x4A;
    Ptr[1] = 0x65;
    Ptr[2] = 0x6C;
    Ptr[3] = 0x6C;
    Ptr[4] = 0x79;
    Ptr[5] = 0x62;
    Ptr[6] = 0x65;
    Ptr[7] = 0x00;
}

unsigned char JbLitL24Raw[4];
char* JbLitL24 = ( char* ) JbLitL24Raw;
{
    volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL24Raw;
    Ptr[0] = 0x50;
    Ptr[1] = 0x49;
    Ptr[2] = 0x43;
    Ptr[3] = 0x00;
}

The difference is clearer in the object file. In the upper area, the literal no longer appears as a pointer to static data: the compiler emits mov BYTE PTR [rsp+...] writes that rebuild JellyBee and PIC inside the active frame. In the lower area, the .rdata dump shows the same content when it lives as static data: 50494300 4a656c6c 79426565 00000000, meaning PIC\0JellyBee\0.

Comparison between a stack-rebuilt literal and the same literal in .rdata
Stack literal.
Comparison between a stack-rebuilt literal and the same literal in .rdata
Literal in .rdata.

JB_PILL_LOG(): Controlled Pill Output

JB_PILL_LOG() is not a real blob function: it is a macro emitted by phase7 when log usage or minimal stdio is detected. The macro routes text into JbtoLogDispatch(), which writes into the JbSwarmPill buffer received by pinit().

In the final source this is visible directly in pill/main_pill_phase9.c: the macro is defined near the beginning of the file and later consumed like a normal call, for example JB_PILL_LOG( 1, JbPillCallCtx, JbLitL203 );.

main_pill_phase9.c
#include "runtime/mincrt.h"
#ifdef JB_PILL_LOG
#undef JB_PILL_LOG
#endif
#define JB_PILL_LOG(wide, out_ptr, fmt, ...) JbtoLogDispatch((out_ptr), (wide), (fmt), ##__VA_ARGS__)

...

if ( !BuildLdapFilter( pszSearchFilter, 260 * 2, filter ) )
{
    JB_PILL_LOG( 1, JbPillCallCtx, JbLitL203 );
    return _HRESULT_TYPEDEF_( 0x80070057 );
}

The real output channel is JbSwarmPill. The loader passes it to pinit() through RCX; JbtoBindOut( Pill ) validates the buffer, sets Length = 0, writes L'\0', and registers that context for the thread.

typedef struct
{
    wchar_t* Buffer;
    size_t   Capacity;
    size_t   Length;
    int      CtxArgc;
    char**   CtxArgv;
} JbSwarmPill;

Literal Matrix and Log Pooling

*_output_Matrix_rebuild.json
{
  "tags": {
    "<a>": "[+] KLIST result: "
  }
}

The Log Pooling detector reads candidates from ast_scan.json. The current rule looks for literals that start with [+], [-], [*], [x], or [X]. A typical case would be:

printf("[+] KLIST result: ");

From __imp_MessageBoxA to Hash-Based Resolution

phase1 normalizes the source and labels it with short markers so later phases do not depend on ambiguous searches. In the base case, it creates main_pill_phase1.c with API, literal, macro, and name tags:

main_pill_phase1.c
#include <windows.h>

int main( void )
{
    MessageBoxA/*<A1>*//*<N1>*/( NULL/*<M1>*/, "Jellybe"/*<L11>*//*<L1>*//*<L23>*//*<L3>*/, "PIC"/*<L24>*//*<L2>*/, MB_OK/*<M2>*/ );
    return 0;
}
correlation_phase1.json
{
  "targets": [
    {
      "tag": "<A1>",
      "kind": "API",
      "value": "MessageBoxA",
      "json_pointer": "/detections/api_calls/0",
      "matches": [
        {
          "file": "JB_msgbox_callspoof/qa/PIC_1/pill/main_pill_phase1.c",
          "offset": 46,
          "line": 6,
          "column": 5,
          "length": 11,
          "match_type": "identifier"
        }
      ]
    },
    {
      "tag": "<N1>",
      "kind": "NM",
      "value": "MessageBoxA",
      "json_pointer": "/detections/nm_undefined_clean/0",
      "matches": [
        {
          "file": "JB_msgbox_callspoof/qa/PIC_1/pill/main_pill_phase1.c",
          "offset": 46,
          "line": 6,
          "column": 5,
          "length": 11,
          "match_type": "identifier"
        }
      ]
    }
  ]
}

With those detections, phase4 builds the final API and DLL list. Then Phase4BuildHashesHeader emits runtime/api_hashes.h using the ror13-lowercase algorithm. For this case, the result is JB_PILL_HASH_MessageBoxA 0xFC4DA2D0u. The DLL list is emitted as JB_PILL_DLL_LIST using JB_PILL_STACK_STR(...). In this example, the associated module is user32.dll.

  ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() );

  JbApiMessageBoxA( ( ( void * ) 0 ), JbLitL11, JbLitL24, __MSABI_LONG( 0x00000000 ) );

How hashes are consumed: the transformed source replaces the original call with JB_PILL_APIHP( MessageBoxA, ... ). That macro falls into JbRequireApiByHash( JB_PILL_HASH_MessageBoxA ), then into JbResolveApi, and finally into JbFindExportByHash to compare the hash against the candidate module's EAT. If the export is a forwarder, JbFindExportByHash detects that the address falls inside the export-directory range and branches into JbResolveForwarder. There, DLL.Symbol is parsed, Symbol is rehashed with the same ror13-lowercase scheme, and the final target is resolved with another call to JbFindExportByHash.

Decorative adhesive patch

Classic PE, RVA, IAT, and Why It Is Removed

In a classic PE, MessageBoxA is not called through a fixed address inside the COFF object. MinGW compiles the call as an indirection through __imp_MessageBoxA. On x64 this appears as call qword ptr [rip + disp32]. That disp32 starts as zero, and the object carries an IMAGE_REL_AMD64_REL32 relocation against __imp_MessageBoxA.

(.../x86_64-w64-mingw32/lib/libuser32.a)(libuser32s00612.o)(__imp_MessageBoxA)
IMAGE_REL_AMD64_REL32 over the RIP-relative displacement of __imp_MessageBoxA.

The linker joins the import pieces into .idata. That is where the import descriptors, Import Lookup Table, Import Address Table, and hint/name entries appear. In PE/COFF those tables are expressed with RVAs. An RVA is not an absolute address; it is an offset from the image base. When the process loads the PE, the loader computes VA = ImageBase + RVA, loads the DLL, resolves the export, and writes the final function address into the IAT slot.

Bytes thunk:

RipNext         = Va(Call) + 6
IatSlotVa       = RipNext + Disp32PatchedByLinker
FinalJumpTarget = *(UINT64*)IatSlotVa

PE relocation, RVA, and IAT for the MessageBoxA example
Diagram 4. COFF, relocation, IAT slot, and hash resolution.

That model depends on .idata, the PE loader, and IAT patching. In PILL, the target is a blob executable from the JellyBee loader without relying on static imports. Therefore .idata is discarded and the __imp_* path is replaced by local resolution through the PEB, the export directory, and hashes.

Synthetic Callstack with Synthetic Frames

The synthetic callstack is a selective transformation. phase4 receives a list of selected APIs from --spoof-api or --spoof-hash and only changes those calls. The wrapper can also materialize a phase5B and a derived ASM file, *_pill_phase5B_syntheticCallStack.asm, when the real spoofing flow is enabled.

The normal call resolves the pointer with JB_PILL_APIHP. When spoofing is enabled, the same hash is kept, but the invocation changes to JB_PILL_SPOOF_CALL_HASH. That macro prepares a JB_PILL_SPOOF_CFG_T configuration, stores arguments, and transfers control to the spoof_call_synthetic assembly stub.

main_pill_phase4.c
ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() );

JB_PILL_SPOOF_CALL_HASH(
    ( ( JB_PILL_SPOOF_CFG_T ){0} ),
    JB_PILL_HASH_MessageBoxA,
    ( ( void * ) 0 ),
    JbLitL11,
    JbLitL24,
    __MSABI_LONG( 0x00000000 )
);

Participating Parts

  • phase4 decides whether a selected API can be transformed. If arity is zero or greater than eight arguments, the call is left unspoofed.
  • runtime/spoof_bootstrap.h resolves modules, exports, gadgets, and unwind metadata.
  • transpilador/asm/syntheticCallStack.asm provides spoof_call_synthetic, parameter_handler_synthetic, and restore_synthetic.
  • JB_PILL_SPOOF_CFG_T carries frame addresses, gadgets, stack sizes, original return address, argument count, and up to eight arguments.

How Frames Are Prepared

The synthetic-frame implementation was developed using the klezVirus/SilentMoonwalk PoC as a reference, adapting dynamic call stack spoofing to the PIC/PILL runtime and hash-based API resolution.

First, kernel32 is located and useful symbols are searched to build a credible stack. The runtime works with BaseThreadInitThunk and RtlUserThreadStart, and it also looks for functions with valid unwind metadata. For each frame, the prologue stack consumption is calculated using .pdata and .xdata. That calculation avoids hard-coded frame sizes.

Then specific gadgets are searched. The flow uses a jmp rbx gadget and an add rsp, 0x38 gadget. With those pieces, it prepares a chain where the visible return does not point directly to pinit, but to an unwind-compatible frame path.

How It Executes

spoof_call_synthetic saves preserved registers, reserves a work area, places synthetic returns, and transfers control to the hash-resolved API. parameter_handler_synthetic distributes arguments following Win64: RCX, RDX, R8, R9, then stack space for additional arguments. When the API returns, restore_synthetic restores rsp, rbp, rbx, and r15.

The result in WinDbg is that the stack no longer shows a direct pinit -> API transition. The trace goes through frames with valid metadata, and the debugger can unwind it coherently.

Callstack Without Spoofing

The stack shows the blob's direct call-site.

Callstack With Active Spoofing

Synthetic frames with coherent unwind.