Abstractions, Aspects, and PIC Modularity: The Compiler Subversion 1
Compiler Internals
Swarm Factory
The goal of Swarm Factory in the system is to dissect projects and tools, compile selected functionality into PIC, and add capabilities and evasion layers. The aspect-oriented idea came later: during transformation, a capability can be isolated at a specific phase and composed back as a cross-cutting concern. That changed my initial assumptions about section handling.
Aspect-oriented programming is a paradigm for adding functionality to a base program when that functionality would normally require code spread across many places, but doing it from one controlled point. The classic example is event logging: the base program may have no integrated logging, while AOP provides a way to apply that concern uniformly across multiple points of interest.
This AOP framing was influenced by the Tradecraft Garden ecosystem developed by Raphael Mudge, the creator of Cobalt Strike and a Metasploit contributor. I will leave the broader analysis for later posts. First, we will examine our "compiler" in depth, what it returns, and how its telemetry helps us branch into different technologies.
Build JSON Artifacts
They act as contracts between the preparator, the transpilador, the linker, the graphical view, and the debugging layer. The current version consumes more structured outputs and reduces heuristics: preparator detections, toolchain paths, the symbol-to-import-library index, AST metadata, the rewrite plan, the argument matrix, validation reports, and the final manifest.
-
PROJECT_ROOT/
pill/build_manifest.json(wrapper manifest: effective source, tools, phases, hashes, and final bin)-
preparator/main_json/
report.json/report_advanced.json(source detections, per-phase artifacts, and API-DLL hints)toolchain_scan.json(paths, sysroot, available libraries, and candidates)importlib_symbol_index.json(symbol-to-import-library mapping)
-
pill/
correlation_phase1.json(enriched report with correlated tags, detections, and targets)ast_scan.json/rewrite_plan.json(lightweight AST, call graph, literals, hotspots, and planned decisions)ast_validation.json/ast_schema.json/ast_phase_diff.json(validation and source -> phase1..phase9 diff)ast_rewrite_apply.json/ast_size_report.json(wrapper-applied rewrites and final PIC section dependencies)pillArgsInstMatrix.json(GIMPLE matrix used to rebuildargc/argv)desglobalization.json(global state removed, moved, or rejected)*_orphanApiValidation.json(final orphan-symbol classification)*_output_Matrix_rebuild.json(literal/log tags rebuilt outside the blob)-
debug/
swarm_symbols_phase9.json(final blob offsets and symbols)
Native C to PIC Transformation, from Preprocessor to COFF
The kdRajKit/JellyBeeSystem-BlogLabs repository includes three test PIC payloads and a P2P loader for debugging the flow outside the article.
Starting Source Code
#include <windows.h>
int main( void )
{
MessageBoxA( NULL, "Jellybe", "PIC", MB_OK );
return 0;
}
Core Emitted After Rajpilator
The following block shows the core of the final output: the pinit() contract, hash-based API resolution, and literals materialized on the stack. The real file may include Win32 headers and auxiliary runtime blocks around it.
#include "runtime/api_hashes.h"
#include "runtime/pill.h"
typedef struct
{
wchar_t* Buffer;
size_t Capacity;
size_t Length;
} JbSwarmPill;
void pinit( JbSwarmPill* Pill );
void pinit( JbSwarmPill* Pill )
{
ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() );
unsigned char JbLitL11Raw[9];
char* JbLitL11 = ( char* ) JbLitL11Raw;
{
volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL11Raw;
Ptr[0] = 0x4A;
Ptr[1] = 0x65;
Ptr[2] = 0x6C;
Ptr[3] = 0x6C;
Ptr[4] = 0x79;
Ptr[5] = 0x42;
Ptr[6] = 0x65;
Ptr[7] = 0x65;
Ptr[8] = 0x00;
}
unsigned char JbLitL24Raw[4];
char* JbLitL24 = ( char* ) JbLitL24Raw;
{
volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL24Raw;
Ptr[0] = 0x50;
Ptr[1] = 0x49;
Ptr[2] = 0x43;
Ptr[3] = 0x00;
}
JbApiMessageBoxA( ( ( void * ) 0 ), JbLitL11, JbLitL24, __MSABI_LONG( 0x00000000 ) );
return;
}
ABI Entry, Layout, and pinit Contract
PROJECT_ROOT/pill contains abi.asm and
pill.ld.
The canonical generator lives in transpilador/app/transpilador.c; those files are materialized inside pill/ for each build.
Section Order
The linker script sets abi as the entrypoint, places .text.abi first, then .text.pinit, and then the remaining .text*. That order keeps the entry stub at the beginning of the blob and places pinit() immediately behind the ABI contract before the rest of the emitted code.
.data is preserved because it is not PE metadata or a loader convenience: it can hold initialized state that the code itself needs at runtime. If the compiler or the minimal runtime emits initialized globals, pointers, small tables, initialized buffers, or any writable object referenced from .text, discarding .data would break those references. The usual result would be a missing-memory access, a garbage read, or a write into an address that no longer matches the real blob layout.
ENTRY(abi)
SECTIONS
{
. = __image_base__ + 0x1000;
.text :
{
*(.text.abi)
*(.text.pinit)
*(.text*)
}
.data :
{
*(.data*)
}
/DISCARD/ : { *(.idata*) }
}
Generated ABI and Stack Alignment
The generated sequence appears once per build. The reading below breaks the stub down into concrete guarantees.
extern pinit
global abi
segment .text
abi:
push rdi ; save rdi because it is the working base register
mov rdi, rsp ; preserve the original stack pointer in rdi
and rsp, byte -0x10 ; align the stack to 16 bytes
sub rsp, 0x20 ; reserve x64 ABI shadow space
; RCX arrives from the caller and holds JbSwarmPill* when present
; do not touch RCX, so --ctx-args can work at runtime
call pinit ; call the C function
; EAX keeps the real pinit return value
mov rsp, rdi ; restore the original stack pointer
pop rdi ; restore rdi
ret ; return to the caller
Point-by-Point Reading
RCXis not touched beforecall pinit; therefore it can carryJbSwarmPill*.RDX,R8, andR9are not part of the base input contract; arguments and output travel insidePill.RSPis aligned to 16 bytes and Win64 shadow space is reserved before entering C.RDIpreserves the original stack value and is restored before returning.EAXkeeps the realpinit()return value; the stub does not force it to zero.- Any stack pivot, TEB manipulation, or alternative prologue must be introduced as its own ABI before assembly.
abi.asm.RCX Contract Toward pinit
After reading the stub, the important point is clear: abi aligns RSP and reserves shadow space, but it does not overwrite RCX. The pointer prepared by the loader can therefore cross intact into the C call.
pinit does not receive loose arguments through RDX, R8, or R9 in the base contract. Everything crosses through RCX as JbSwarmPill*: the output buffer, its capacity, the output counter, and, when applicable, the context argument vector.
RCX into pinit().Minimal pinit Contract
pinit(JbSwarmPill* Pill)is the stable C entrypoint that receives the pointer transported throughRCX.- The base output contract lives in
Pill:Buffer,Capacity, andLength. JbtoBindOut( Pill )initializes that output and makes it the active buffer of the minimal runtime.- Optional arguments live in the
Pillcontext, normally asCtxArgcandCtxArgv, and are consumed only if the loader marks them valid.
Args Hook and argv Instrumentation
The argument path is separate from the base contract. phase5A builds the minimal argc/argv environment only when the source needs it, reading the context from Pill and not from additional ABI registers. The transpilador accepts both flag names: --ctx-args and --args-ctx.
- In materialized-argument mode,
param_NandargvLocalare generated inside the function. - In context-argument mode,
Pill->CtxArgcandPill->CtxArgvare read only if the loader provides a valid vector. - The context vector is accepted when
Pillexists,CtxArgvexists, andCtxArgcis between 1 and 64. - If the context is not valid, a minimal local fallback is kept for
argcandargv. - If the source uses
ArgcorArgv, local aliases are created fromargcandargv.
argc/argv rebuilt in pinit().
--ctx-args: arguments after the .bin, output in console/WinDbg.Control Structures and Unwanted Sections
Dense switch statements can be lowered into jump tables. In that case, the code no longer decides only with local comparisons: it computes an index, reads an entry from a table in .rdata, and jumps into a .text block. The typical object-level pattern is:
movslq (%rsi,%rdx,4), %rdx add %rsi, %rdx jmp *%rdx
The dependency proof appears in the relocations: .rdata contains relative entries into .text. If that section is removed, the indirect jump reads a missing or displaced table.
RELOCATION RECORDS FOR [.rdata]: OFFSET TYPE VALUE 0000000000000018 IMAGE_REL_AMD64_REL32 .text 000000000000001c IMAGE_REL_AMD64_REL32 .text ... 000000000000007c IMAGE_REL_AMD64_REL32 .text
In the current PIC flow, raj_flow_pic.py compiles phases with -fno-jump-tables to reduce this pattern. For especially dense sources, -fno-tree-switch-conversion can be added as extra frontend/GIMPLE hardening.
-fno-jump-tables -fno-tree-switch-conversion # optional when tree/GIMPLE conversions must be blocked
Even then, the final criterion is still COFF verification: the movslq/add/jmp *reg pattern and jump-table relocations from .rdata into .text must not remain if the final binary is intended to be .text-only.
CRT in the Flow
The CRT is the layer that turns a freshly loaded process into a stable C environment before main runs. Its final effect is not merely "startup"; it establishes runtime invariants so application code behaves predictably.
At startup, the CRT initializes global C-library state, prepares execution contracts (argc/argv/envp), configures runtime pieces used by libc functions, and chains global initializers/constructors. At exit, it coordinates orderly termination (atexit, destructors, and runtime-managed resource cleanup).
Replacing it with a minimal CRT reduces surface area and runtime dependency, but it also removes that guarantee framework: anything that depends on implicit initialization is no longer covered. In practice, the binary gains fine-grained control over the flow, but it must explicitly and locally materialize every initialization/termination step previously handled by the full CRT.
Minimal CRT Limitations in PIC Mode
- There is no full CRT initialization for global objects/constructors or standard termination paths (
atexitand destructors). - There is no complete argument/environment layer with classic runtime behavior (
argc/argv/envpnormalization and associated helpers). - The full
stdio/locale/printflibrary is not present; only the subset included inmincrt.hremains. - Code that depends on internal CRT symbols, such as
__mainor runtime initializers, is no longer portable.
Data, Literals, and Imports When Converging to PIC/PILL
This block materializes the literal on the stack. JbLitL11Raw reserves 8 local bytes inside pinit(), Ptr writes the text byte by byte, and JbLitL11 becomes a C pointer to the \0-terminated buffer. The blob therefore does not need a static .rdata reference for that string: the data is born at runtime inside the active frame and consumed while pinit() is still executing.
unsigned char JbLitL11Raw[8];
char* JbLitL11 = ( char* ) JbLitL11Raw;
{
volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL11Raw;
Ptr[0] = 0x4A;
Ptr[1] = 0x65;
Ptr[2] = 0x6C;
Ptr[3] = 0x6C;
Ptr[4] = 0x79;
Ptr[5] = 0x62;
Ptr[6] = 0x65;
Ptr[7] = 0x00;
}
unsigned char JbLitL24Raw[4];
char* JbLitL24 = ( char* ) JbLitL24Raw;
{
volatile unsigned char* Ptr = ( volatile unsigned char* ) JbLitL24Raw;
Ptr[0] = 0x50;
Ptr[1] = 0x49;
Ptr[2] = 0x43;
Ptr[3] = 0x00;
}
The difference is clearer in the object file. In the upper area, the literal no longer appears as a pointer to static data: the compiler emits mov BYTE PTR [rsp+...] writes that rebuild JellyBee and PIC inside the active frame. In the lower area, the .rdata dump shows the same content when it lives as static data: 50494300 4a656c6c 79426565 00000000, meaning PIC\0JellyBee\0.
.rdata.JB_PILL_LOG(): Controlled Pill Output
JB_PILL_LOG() is not a real blob function: it is a macro emitted by phase7 when log usage or minimal stdio is detected. The macro routes text into JbtoLogDispatch(), which writes into the JbSwarmPill buffer received by pinit().
In the final source this is visible directly in pill/main_pill_phase9.c: the macro is defined near the beginning of the file and later consumed like a normal call, for example JB_PILL_LOG( 1, JbPillCallCtx, JbLitL203 );.
#include "runtime/mincrt.h"
#ifdef JB_PILL_LOG
#undef JB_PILL_LOG
#endif
#define JB_PILL_LOG(wide, out_ptr, fmt, ...) JbtoLogDispatch((out_ptr), (wide), (fmt), ##__VA_ARGS__)
...
if ( !BuildLdapFilter( pszSearchFilter, 260 * 2, filter ) )
{
JB_PILL_LOG( 1, JbPillCallCtx, JbLitL203 );
return _HRESULT_TYPEDEF_( 0x80070057 );
}
The real output channel is JbSwarmPill. The loader passes it to pinit() through RCX; JbtoBindOut( Pill ) validates the buffer, sets Length = 0, writes L'\0', and registers that context for the thread.
typedef struct
{
wchar_t* Buffer;
size_t Capacity;
size_t Length;
int CtxArgc;
char** CtxArgv;
} JbSwarmPill;
Literal Matrix and Log Pooling
{
"tags": {
"<a>": "[+] KLIST result: "
}
}
The Log Pooling detector reads candidates from ast_scan.json. The current rule looks for literals that start with [+], [-], [*], [x], or [X]. A typical case would be:
printf("[+] KLIST result: ");
From __imp_MessageBoxA to Hash-Based Resolution
phase1 normalizes the source and labels it with short markers so later phases do not depend on ambiguous searches. In the base case, it creates main_pill_phase1.c with API, literal, macro, and name tags:
#include <windows.h>
int main( void )
{
MessageBoxA/*<A1>*//*<N1>*/( NULL/*<M1>*/, "Jellybe"/*<L11>*//*<L1>*//*<L23>*//*<L3>*/, "PIC"/*<L24>*//*<L2>*/, MB_OK/*<M2>*/ );
return 0;
}
{
"targets": [
{
"tag": "<A1>",
"kind": "API",
"value": "MessageBoxA",
"json_pointer": "/detections/api_calls/0",
"matches": [
{
"file": "JB_msgbox_callspoof/qa/PIC_1/pill/main_pill_phase1.c",
"offset": 46,
"line": 6,
"column": 5,
"length": 11,
"match_type": "identifier"
}
]
},
{
"tag": "<N1>",
"kind": "NM",
"value": "MessageBoxA",
"json_pointer": "/detections/nm_undefined_clean/0",
"matches": [
{
"file": "JB_msgbox_callspoof/qa/PIC_1/pill/main_pill_phase1.c",
"offset": 46,
"line": 6,
"column": 5,
"length": 11,
"match_type": "identifier"
}
]
}
]
}
With those detections, phase4 builds the final API and DLL list. Then Phase4BuildHashesHeader emits runtime/api_hashes.h using the ror13-lowercase algorithm. For this case, the result is JB_PILL_HASH_MessageBoxA 0xFC4DA2D0u. The DLL list is emitted as JB_PILL_DLL_LIST using JB_PILL_STACK_STR(...). In this example, the associated module is user32.dll.
ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() ); JbApiMessageBoxA( ( ( void * ) 0 ), JbLitL11, JbLitL24, __MSABI_LONG( 0x00000000 ) );
How hashes are consumed: the transformed source replaces the original call with JB_PILL_APIHP( MessageBoxA, ... ). That macro falls into JbRequireApiByHash( JB_PILL_HASH_MessageBoxA ), then into JbResolveApi, and finally into JbFindExportByHash to compare the hash against the candidate module's EAT. If the export is a forwarder, JbFindExportByHash detects that the address falls inside the export-directory range and branches into JbResolveForwarder. There, DLL.Symbol is parsed, Symbol is rehashed with the same ror13-lowercase scheme, and the final target is resolved with another call to JbFindExportByHash.
Classic PE, RVA, IAT, and Why It Is Removed
In a classic PE, MessageBoxA is not called through a fixed address inside the COFF object. MinGW compiles the call as an indirection through __imp_MessageBoxA. On x64 this appears as call qword ptr [rip + disp32]. That disp32 starts as zero, and the object carries an IMAGE_REL_AMD64_REL32 relocation against __imp_MessageBoxA.
(.../x86_64-w64-mingw32/lib/libuser32.a)(libuser32s00612.o)(__imp_MessageBoxA)
IMAGE_REL_AMD64_REL32 over the RIP-relative displacement of __imp_MessageBoxA.The linker joins the import pieces into .idata. That is where the import descriptors, Import Lookup Table, Import Address Table, and hint/name entries appear. In PE/COFF those tables are expressed with RVAs. An RVA is not an absolute address; it is an offset from the image base. When the process loads the PE, the loader computes VA = ImageBase + RVA, loads the DLL, resolves the export, and writes the final function address into the IAT slot.
Bytes thunk: RipNext = Va(Call) + 6 IatSlotVa = RipNext + Disp32PatchedByLinker FinalJumpTarget = *(UINT64*)IatSlotVa
That model depends on .idata, the PE loader, and IAT patching. In PILL, the target is a blob executable from the JellyBee loader without relying on static imports. Therefore .idata is discarded and the __imp_* path is replaced by local resolution through the PEB, the export directory, and hashes.
Synthetic Callstack with Synthetic Frames
The synthetic callstack is a selective transformation. phase4 receives a list of selected APIs from --spoof-api or --spoof-hash and only changes those calls. The wrapper can also materialize a phase5B and a derived ASM file, *_pill_phase5B_syntheticCallStack.asm, when the real spoofing flow is enabled.
The normal call resolves the pointer with JB_PILL_APIHP. When spoofing is enabled, the same hash is kept, but the invocation changes to JB_PILL_SPOOF_CALL_HASH. That macro prepares a JB_PILL_SPOOF_CFG_T configuration, stores arguments, and transfers control to the spoof_call_synthetic assembly stub.
ULONG_PTR ( *JbApiMessageBoxA )() = JB_PILL_APIHP( MessageBoxA, ULONG_PTR ( * )() );
JB_PILL_SPOOF_CALL_HASH(
( ( JB_PILL_SPOOF_CFG_T ){0} ),
JB_PILL_HASH_MessageBoxA,
( ( void * ) 0 ),
JbLitL11,
JbLitL24,
__MSABI_LONG( 0x00000000 )
);
Participating Parts
phase4decides whether a selected API can be transformed. If arity is zero or greater than eight arguments, the call is left unspoofed.runtime/spoof_bootstrap.hresolves modules, exports, gadgets, and unwind metadata.transpilador/asm/syntheticCallStack.asmprovidesspoof_call_synthetic,parameter_handler_synthetic, andrestore_synthetic.JB_PILL_SPOOF_CFG_Tcarries frame addresses, gadgets, stack sizes, original return address, argument count, and up to eight arguments.
How Frames Are Prepared
The synthetic-frame implementation was developed using the klezVirus/SilentMoonwalk PoC as a reference, adapting dynamic call stack spoofing to the PIC/PILL runtime and hash-based API resolution.
First, kernel32 is located and useful symbols are searched to build a credible stack. The runtime works with BaseThreadInitThunk and RtlUserThreadStart, and it also looks for functions with valid unwind metadata. For each frame, the prologue stack consumption is calculated using .pdata and .xdata. That calculation avoids hard-coded frame sizes.
Then specific gadgets are searched. The flow uses a jmp rbx gadget and an add rsp, 0x38 gadget. With those pieces, it prepares a chain where the visible return does not point directly to pinit, but to an unwind-compatible frame path.
How It Executes
spoof_call_synthetic saves preserved registers, reserves a work area, places synthetic returns, and transfers control to the hash-resolved API. parameter_handler_synthetic distributes arguments following Win64: RCX, RDX, R8, R9, then stack space for additional arguments. When the API returns, restore_synthetic restores rsp, rbp, rbx, and r15.
The result in WinDbg is that the stack no longer shows a direct pinit -> API transition. The trace goes through frames with valid metadata, and the debugger can unwind it coherently.
Callstack Without Spoofing
Callstack With Active Spoofing