Re: [PATCH v2] libbpf: Improve version handling when attaching uprobe

From: Espen Grindhaug
Date: Mon May 01 2023 - 09:01:09 EST


On Thu, Apr 27, 2023 at 06:19:29PM -0700, Yonghong Song wrote:
>
>
> On 4/27/23 12:19 PM, Espen Grindhaug wrote:
> > On Wed, Apr 26, 2023 at 02:47:27PM -0700, Yonghong Song wrote:
> > >
> > >
> > > On 4/23/23 11:55 AM, Espen Grindhaug wrote:
> > > > This change fixes the handling of versions in elf_find_func_offset.
> > > > In the previous implementation, we incorrectly assumed that the
> > >
> > > Could you give more explanation/example in the commit message
> > > what does 'incorrectly' mean here? In which situations the
> > > current libbpf implementation will not be correct?
> > >
> >
> > How about something like this?
> >
> >
> > libbpf: Improve version handling when attaching uprobe
> >
> > This change fixes the handling of versions in elf_find_func_offset.
> >
> > For example, let's assume we are trying to attach an uprobe to pthread_create in
> > glibc. Prior to this commit, it would fail with an error message saying 'elf:
> > ambiguous match [...]', this is because there are two entries in the symbol
> > table with that name.
> >
> > $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep pthread_create
> > 0000000000094cc0 T pthread_create@GLIBC_2.2.5
> > 0000000000094cc0 T pthread_create@@GLIBC_2.34
> >
> > So we go ahead and modify our code to attach to 'pthread_create@@GLIBC_2.34',
> > and this also fails, but this time with the error 'elf: failed to find symbol
> > [...]'. This fails because we incorrectly assumed that the version information
> > would be present in the string found in the string table, but there is only the
> > string 'pthread_create'.
>
> I tried one example with my centos8 libpthread library.
>
> $ llvm-readelf -s /lib64/libc-2.28.so | grep pthread_cond_signal
> 39: 0000000000095f70 43 FUNC GLOBAL DEFAULT 14
> pthread_cond_signal@@GLIBC_2.3.2
> 40: 0000000000096250 43 FUNC GLOBAL DEFAULT 14
> pthread_cond_signal@GLIBC_2.2.5
> 3160: 0000000000096250 43 FUNC LOCAL DEFAULT 14
> __pthread_cond_signal_2_0
> 3589: 0000000000095f70 43 FUNC LOCAL DEFAULT 14
> __pthread_cond_signal
> 5522: 0000000000095f70 43 FUNC GLOBAL DEFAULT 14
> pthread_cond_signal@@GLIBC_2.3.2
> 5545: 0000000000096250 43 FUNC GLOBAL DEFAULT 14
> pthread_cond_signal@GLIBC_2.2.5
> $ nm -D /lib64/libc-2.28.so | grep pthread_cond_signal
> 0000000000095f70 T pthread_cond_signal@@GLIBC_2.3.2
> 0000000000096250 T pthread_cond_signal@GLIBC_2.2.5
> $
>
> Note that two pthread_cond_signal functions have different addresses,
> which is expected as they implemented for different versions.
>
> But in your case,
> > $ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep pthread_create
> > 0000000000094cc0 T pthread_create@GLIBC_2.2.5
> > 0000000000094cc0 T pthread_create@@GLIBC_2.34
>
> Two functions have the same address which is very weird and I suspect
> some issues here at least needs some investigation.
>

I am no expert on this, but as far as I can tell, this is normal,
although much more common on my Ubuntu machine than my Fedora machine.

Script to find duplicates:

nm -D /usr/lib64/libc-2.33.so | awk '
{
addr = $1;
symbol = $3;
sub(/[@].*$/, "", symbol);

if (addr == prev_addr && symbol == prev_symbol) {
if (prev_symbol_printed == 0) {
print prev_line;
prev_symbol_printed = 1;
}
print;
} else {
prev_symbol_printed = 0;
}
prev_addr = addr;
prev_symbol = symbol;
prev_line = $0;
}'


> Second, for the symbol table, the following is ELF encoding,
>
> typedef struct {
> Elf64_Word st_name;
> unsigned char st_info;
> unsigned char st_other;
> Elf64_Half st_shndx;
> Elf64_Addr st_value;
> Elf64_Xword st_size;
> } Elf64_Sym;
>
> where
> st_name
>
> An index into the object file's symbol string table, which holds the
> character representations of the symbol names. If the value is nonzero, the
> value represents a string table index that gives the symbol name. Otherwise,
> the symbol table entry has no name.
>
> So, the function name (including @..., @@...) should be in string table
> which is the same for the above two pthread_cond_signal symbols.
>
> I think it is worthwhile to debug why in your situation
> pthread_create@GLIBC_2.2.5 and pthread_create@@GLIBC_2.34 do not
> have them in the string table.
>

I think you are mistaken here; the strings in the strings table don't contain
the version. Take a look at this partial dump of the strings table.

$ readelf -W -p .dynstr /usr/lib64/libc-2.33.so

String dump of section '.dynstr':
[ 1] xdrmem_create
[ f] __wctomb_chk
[ 1c] getmntent
[ 26] __freelocale
[ 33] __rawmemchr
[ 3f] _IO_vsprintf
[ 4c] getutent
[ 55] __file_change_detection_for_path
(...)
[ 350e] memrchr
[ 3516] pthread_cond_signal
[ 352a] __close
(...)
[ 61b6] GLIBC_2.2.5
[ 61c2] GLIBC_2.2.6
[ 61ce] GLIBC_2.3
[ 61d8] GLIBC_2.3.2
[ 61e4] GLIBC_2.3.3

As you can see, the strings have no versions, and the version strings
themselves are also in this table as entries at the end of the table.

> >
> > This patch reworks how we compare the symbol name provided by the user if it is
> > qualified with a version (using @ or @@). We now look up the correct version
> > string in the version symbol table before constructing the full name, as also
> > done above by nm, before comparing.
> >
> > > > version information would be present in the string found in the
> > > > string table.
> > > >
> > > > We now look up the correct version string in the version symbol
> > > > table before constructing the full name and then comparing.
> > > >
> > > > This patch adds support for both name@version and name@@version to
> > > > match output of the various elf parsers.
> > > >
> > > > Signed-off-by: Espen Grindhaug <espen.grindhaug@xxxxxxxxx>
> > >
> > > [...]