![]() |
One issue in working with ELF binaries is that they include a .init section and a .fini section. The .init section provides executable code for the initialization of the program. Since ELF assumes it is working in a multiprogramming environment, it uses this code to save registers and other system state information. Also, any shared object file included in the program also has an opportunity to run its initialization code before the call to the main program. Thus, a large amount of start up code can be executed when using a shared library before the main program is even called. Since the simulator is currently not simulating a multiprogramming environment this superfluous code does not need to be executed. The issue is that the pointer ELF supplies in the ELF header, shown in Figure 2.6, to the first executable instruction references the beginning of the .init file. Another issue is that there is no way to tell when the main program ends. It is just a function call from the .init file and, when main terminates, control returns to that location and then proceeds to call the .fini code which holds executable instructions that contribute to the process termination code. This is also extraneous code that does not need to be run in the simulator. However, if this code is circumvented, to improve performance, then an issue becomes how to stop the simulation. The HLT command executes after the return from the .fini code. If the .init code is eliminated it becomes difficult to know when the main function has truly ended.
We decided, for reasons mentioned above, we would prefer to have the first simulated instruction be the first instruction in the main procedure. The label for the main procedure in the code has the same virtual address as the first instruction in the main procedure. To find the virtual address of the main label, we parsed the ELF executable file, found the symbol table section, and retrieved the virtual address. In the same way, we retrieved the virtual address for the printf label. For reasons also explained above, we wanted to be able to identify when a call to printf occurred so that we could handle the request locally. We describe how we obtained virtual addresses for the main and printf labels in the remainder of this section.
Figure 3.7 illustrates getMainPrintf(), the function that we used in Sim386 to determine the virtual addresses for the printf and main labels. First, we retrieved the data for the section that contains the null terminated string names for the sections. Next, we traverse through all sections looking for the sections with the name ``.strtab'' and ``.symtab''. The .symtab section is an array of structures where each structure describes a symbol in the executable. Figure 3.8 shows the fields of the structures in the symbol table; for example, the first field in the structure is st_name, an index into the .strtab section. The .strtab section contains the null-terminated string representations of the names of the symbols in the symbol table. Once these sections are found we retrieve their associated data with the ELF access library function elf_getdata(). Lastly, we search through the symbol table entries looking for ``printf'' and ``main'' recording the field st_value when they are found; note that st_value is the second field in the structure depicted in Figure 3.8. The st_value of printf is used for later comparisons when the CALL instruction is invoked.
![]() |
To run the main procedure, we insert a CALL instruction, that calls the main procedure, using the st_value already obtained. The call to main is placed at the beginning of the text segment that we loaded into our simulator; the EIP is set accordingly. We then set the instruction that follows the call to main to be the HLT instruction so that the program will terminate after returning from main. Figure 3.9 illustrates this sequence of instructions that is added when Sim386 loads an ELF executable. Thus, the first instruction that is executed in our simulator is a call to the main procedure, the main procedure executes, and returns to the next instruction, the HLT instruction. The execution of the HLT instruction then signals the simulator that the simulated program has completed.