Skip to content

Blog

Variadic functions with unknown argument count

One of my colleagues came across a peculiar problem. She had to write an API that accepts variable number of arguments, but number of arguments won't be passed in the arguments list. She cracked it intelligently with following hack.

The Hack

Heart of this hack is a macro that can count the number of arguments passed to it. It has a limitation. Maximum number of arguments can be passed to this macro should be known. For example, if maximum number of arguments can be passed is 5, the macro will look like,

#define COUNT5(...) _COUNT5(__VA_ARGS__, 5, 4, 3, 2, 1)
#define _COUNT5(a, b, c, d, e, count, ...) count

If you want your macro to count 10 or lesser arguments,

#define COUNT10(...) _COUNT10(__VA_ARGS__, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
#define _COUNT10(a, b, c, d, e, f, g, h, i, j, count, ...) count

Let me explain it. Consider below macro call. It will expand like this.

COUNT5(99, 98, 97);
  |
  |
  V
_COUNT5(99, 98, 97, 5, 4, 3, 2, 1)
  |
  |
  V
  3

The three arguments passed to COUNT5 will occupy a, b, c of _COUNT5. 5 and 4 will occupy d, e. Next argument 3 will be in the place of count, that will be returned.

Final solution

So she exposed a macro that accepts variable number of arguments as the API requested. This macro internally used the COUNTX macro to get number of arguments passed. And she passed the count and variable arguments to the actual C function.

Example

A small C program using this hack.

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>

int _sum(int count, ...);

#define COUNT(...) _COUNT(__VA_ARGS__, 5, 4, 3, 2, 1)
#define _COUNT(a, b, c, d, e, count, ...) count

#define sum(...) _sum(COUNT(__VA_ARGS__), __VA_ARGS__)

int _sum(int count, ...) {
    va_list arg_ptr;
    int     sum = 0;
    int     i = 0;

    va_start(arg_ptr, count);

    for (i = 0; i < count; i++) {
        sum += va_arg(arg_ptr, int);
    }

    return sum;
}

int main() {
    printf("%d\n", sum(1, 2, 3, 4, 5));
    printf("%d\n", sum(1, 2, 3));
    printf("%d\n", sum(1));
    printf("%d\n", sum(2, 2, 2, 2, 2));

    return 0;
}

And its output.

kaba@kaba-Vostro-1550:~/variadic
$ gcc variadic.c
kaba@kaba-Vostro-1550:~/variadic
$ ./a.out
15
6
1
10
kaba@kaba-Vostro-1550:~/variadic
$

Custom build kernel for Raspberry Pi

I've already written a post about how to cross-compile mainline kernel for Raspberry Pi. In this post I'm covering how to cross-compile Raspberry Pi Linux. This will be simple and straight forward. I may write a series of posts related to kernel debugging, optimization which will be based on Raspberry Pi kernel. So this post will be starting point for them.

Directory structure,

balakumaran@balakumaran-pc:~/Desktop/RPi$ ls -lh
total 32K
drwxrwxr-x  3 balakumaran balakumaran 4.0K Mar  9 19:33 firmware
drwxr-xr-x  8 balakumaran balakumaran 4.0K Jan 23 01:52 gcc-linaro-7.4.1-2019.02-x86_64_aarch64-linux-gnu
drwxrwxr-x 22 balakumaran balakumaran 4.0K Mar 30 18:38 kernel_out
drwxrwxr-x 26 balakumaran balakumaran 4.0K Mar 30 18:13 linux-rpi-4.14.y
drwxrwxr-x 18 balakumaran balakumaran 4.0K Mar  9 19:34 rootfs
balakumaran@balakumaran-pc:~/Desktop/RPi$
Directory | Purpose | ----------------|-----------------------------------------------------------| gcc-li... | GCC cross compiler from Linaro. Extracted | firmware/boot | boot directory of Raspberry Pi firmware repo | kernel_out | Output directory for Raspberry kernel | rootfs | rootfs from Linaro. Extracted | linux-rpi... | Raspberry Pi kernel repo |

Used Ubuntu image rootfs from Linaro.

Prepare SD card

Make two partition as follows,

balakumaran@balakumaran-pc:~/Desktop/RPi$ sudo fdisk -l /dev/sdc
[sudo] password for balakumaran:
Disk /dev/sdc: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sdc1  *      2048   133119   131072   64M  b W95 FAT32
/dev/sdc2       133120 62333951 62200832 29.7G 83 Linux
balakumaran@balakumaran-pc:~/Desktop/RPi$
Complete steps on how to do this is available in Appendix.

Copy necessary files

balakumaran@balakumaran-pc:/media/balakumaran$ sudo mount /dev/sdc1 /mnt/boot/
balakumaran@balakumaran-pc:/media/balakumaran$ sudo mount /dev/sdc2 /mnt/rootfs/
balakumaran@balakumaran-pc:~/Desktop$ sudo cp -rf ~/Desktop/RPi/firmware/boot/* /mnt/boot/
[sudo] password for balakumaran:
balakumaran@balakumaran-pc:~/Desktop$
balakumaran@balakumaran-pc:~/Desktop$ sudo cp -rf ~/Desktop/RPi/rootfs/* /mnt/rootfs/
balakumaran@balakumaran-pc:~/Desktop$

Build and Install kernel

Unless you are ready for the pain, use stable kernel release.

Setup following environmental variables,

balakumaran@balakumaran-pc:~/Desktop/RPi$ source ~/setup_arm64_build.sh
balakumaran@balakumaran-pc:~/Desktop/RPi$ echo $CROSS_COMPILE
aarch64-linux-gnu-
balakumaran@balakumaran-pc:~/Desktop/RPi$ echo $ARCH
arm64
balakumaran@balakumaran-pc:~/Desktop/RPi$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/balakumaran/Desktop/RPi/gcc-linaro-7.4.1-2019.02-x86_64_aarch64-linux-gnu/bin/
balakumaran@balakumaran-pc:~/Desktop/RPi$

Cross compile Kernel, Device-tree, modules.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ time make ARCH=arm64 O=../kernel_out/ bcmrpi3_defconfig
make[1]: Entering directory '/home/balakumaran/Desktop/RPi/kernel_out'
.
.
.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ time make -j8  ARCH=arm64 O=../kernel_out/
make[1]: Entering directory '/home/balakumaran/Desktop/RPi/kernel_out'
.
.
.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ make ARCH=arm64 O=../kernel_out/ dtbs
make[1]: Entering directory '/home/balakumaran/Desktop/RPi/kernel_out'
.
.
.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ sudo cp ../kernel_out/arch/arm64/boot/Image /mnt/boot/kernel8.img
[sudo] password for balakumaran:
balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ sudo make  ARCH=arm64 O=../kernel_out/ INSTALL_PATH=/mnt/boot/ dtbs_install
make[1]: Entering directory '/home/balakumaran/Desktop/RPi/kernel_out'
arch/arm64/Makefile:27: ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum
arch/arm64/Makefile:40: LSE atomics not supported by binutils
arch/arm64/Makefile:48: Detected assembler with broken .inst; disassembly will be unreliable
make[3]: Nothing to be done for '__dtbs_install'.
  INSTALL arch/arm64/boot/dts/al/alpine-v2-evp.dtb
.
.
.

Create cmdline.txt and config.txt.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ cat /mnt/boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 root=/dev/mmcblk0p2 rootfstype=ext4 rootwait
balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ cat /mnt/boot/config.txt
dtoverlay=pi3-disable-bt
disable_overscan=1
dtparam=audio=on
device_tree=dtbs/4.14.98-v8+/broadcom/bcm2710-rpi-3-b.dtb
overlay_prefix=dtbs/4.14.98-v8+/overlays/
enable_uart=1
balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$

Prepare rootfs

I'm going to use ubuntu-base images with some additional modification as rootfs here. Find ubuntu-base releases at http://cdimage.ubuntu.com/ubuntu-base/releases/. Latest stable is always better. Download and extract ubuntu-base rootfs. Install kernel modules into the rootfs extracted.

balakumaran@balakumaran-pc:~/Desktop/RPi/linux-rpi-4.14.y$ sudo make  ARCH=arm64 O=../kernel_out/ INSTALL_MOD_PATH=$HOME/ubuntu-base/ modules_install
[sudo] password for balakumaran:
make[1]: Entering directory '/home/balakumaran/Desktop/RPi/kernel_out'
arch/arm64/Makefile:27: ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum
arch/arm64/Makefile:40: LSE atomics not supported by binutils
arch/arm64/Makefile:48: Detected assembler with broken .inst; disassembly will be unreliable
  INSTALL arch/arm64/crypto/aes-neon-blk.ko
.
.
.

Copy your resolv.conf for network access.

$ sudo cp -av /run/systemd/resolve/stub-resolv.conf $HOME/rootfs/etc/resolv.conf

Lets chroot into the new rootfs and install necessary packages. But its an arm64 rootfs. So you need qemu user-mode emulation. Install qemu-user-static in your host Ubuntu and copy that to new rootfs. And then chroot will work.

$ sudo apt install qemu-user-static
.
.
.

$ sudo cp /usr/bin/qemu-aarch64-static $HOME/rootfs/usr/bin/
$ sudo chroot $HOME/rootfs/

Change root user password and install necessary packages. As these binaries are running on emulator, they will be bit slower. Its just one time.

$ passwd root
$ apt-get update
$ apt-get upgrade
$ apt-get install sudo ifupdown net-tools ethtool udev wireless-tools iputils-ping resolvconf wget apt-utils wpasupplicant kmod systemd vim

NOTE: If you face any error like cannot create key file at /tmp/, change permission of tmp.

$ chmod 777 /tmp

Download raspberry firmware-nonfree package from raspberry repository, extract wireless firmware and copy it to rootfs. Refer this answer for more details. As I'm having a RPI3b board, I copied brcmfmac43430-sdio.bin and brcmfmac43430-sdio.txt to lib/firmware/brcm

$ mkdir -p $HOME/lib/modules/brcm/
$ cp brcmfmac43430-sdio.txt brcmfmac43430-sdio.bin $HOME/lib/modules/brcm/

Edit etc/fstab or rootfs will be mounted as read-only.

echo "/dev/mmcblk0p2    /   ext4    defaults,noatime    0   1" >> $HOME/rootfs/etc/fstab
I referred this link for rootfs preparation. Though I'm not using, there are steps to remove unwanted files explained.

Reference

  • https://a-delacruz.github.io/ubuntu/rpi3-setup-64bit-kernel
  • https://a-delacruz.github.io/ubuntu/rpi3-setup-filesystem.html
  • https://www.linuxquestions.org/questions/slackware-arm-108/raspberry-pi-3-b-wifi-nic-not-found-4175627137/#post5840054
  • http://cdimage.ubuntu.com/ubuntu-base/releases/
  • https://raspberrypi.stackexchange.com/questions/61319/how-to-add-wifi-drivers-in-custom-kernel

Appendix

Command (m for help): p
Disk /dev/sdc: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sdc1  *      2048   133119   131072   64M  b W95 FAT32
/dev/sdc2       133120 62333951 62200832 29.7G 83 Linux

Command (m for help): d
Partition number (1,2, default 2): 2

Partition 2 has been deleted.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): p
Disk /dev/sdc: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-62333951, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-62333951, default 62333951): +64M

Created a new partition 1 of type 'Linux' and of size 64 MiB.
Partition #1 contains a vfat signature.

Do you want to remove the signature? [Y]es/[N]o: Y

The signature will be removed by a write command.

Command (m for help): n
Partition type
   p   primary (1 primary, 0 extended, 3 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (2-4, default 2):
First sector (133120-62333951, default 133120):
Last sector, +sectors or +size{K,M,G,T,P} (133120-62333951, default 62333951):

Created a new partition 2 of type 'Linux' and of size 29.7 GiB.
Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: Y

The signature will be removed by a write command.

Command (m for help): t
Partition number (1,2, default 2): 1
Hex code (type L to list all codes): L

 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris
 1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden or  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx
 5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data
 6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility
 8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt
 9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access
 a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O
 b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor
 c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi ea  Rufus alignment
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         eb  BeOS fs
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ee  GPT
10  OPUS            55  EZ-Drive        a7  NeXTSTEP        ef  EFI (FAT-12/16/
11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f0  Linux/PA-RISC b
12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f1  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f4  SpeedStor
16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      f2  DOS secondary
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fb  VMware VMFS
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fc  VMware VMKCORE
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fd  Linux raid auto
1c  Hidden W95 FAT3 75  PC/IX           bc  Acronis FAT32 L fe  LANstep
1e  Hidden W95 FAT1 80  Old Minix       be  Solaris boot    ff  BBT
Hex code (type L to list all codes): b

Changed type of partition 'Linux' to 'W95 FAT32'.

Command (m for help): t
Partition number (1,2, default 2): 2
Hex code (type L to list all codes): 83

Changed type of partition 'Linux' to 'Linux'.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

balakumaran@balakumaran-pc:/media/balakumaran$
balakumaran@balakumaran-pc:/media/balakumaran$ sudo fdisk -l /dev/sdc
Disk /dev/sdc: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sdc1         2048   133119   131072   64M  b W95 FAT32
/dev/sdc2       133120 62333951 62200832 29.7G 83 Linux
balakumaran@balakumaran-pc:/media/balakumaran$
balakumaran@balakumaran-pc:/media/balakumaran$ sudo mkfs.fat /dev/sdc1
mkfs.fat 4.1 (2017-01-24)
balakumaran@balakumaran-pc:/media/balakumaran$ sudo mkfs.ext4 /dev/sdc2
mke2fs 1.44.4 (18-Aug-2018)
Creating filesystem with 7775104 4k blocks and 1945888 inodes
Filesystem UUID: 5815d093-6381-4db7-b692-32192b24cf9c
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

balakumaran@balakumaran-pc:/media/balakumaran$
balakumaran@balakumaran-pc:/media/balakumaran$ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.32).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): a
Partition number (1,2, default 2): 1

The bootable flag on partition 1 is enabled now.

Command (m for help): p
Disk /dev/sdc: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sdc1  *      2048   133119   131072   64M  b W95 FAT32
/dev/sdc2       133120 62333951 62200832 29.7G 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

balakumaran@balakumaran-pc:/media/balakumaran$

64-bit Mainline kernel on Raspberry Pi 3

I've struggled a little recently on running vanilla kernel on Raspberry Pi 3. Still I didn't completely understand the internals. Anyway sharing the steps may be useful for someone like me.

Download toolchain and rootfs from Linaro.

And clone following repos * Vanilla kernel * Raspberry Pi kernel - Checkout same version as vanilla kernel you are going to use * Raspberry pi firmware - Or download only the files under boot directory of this repo

I've created a directory structure as below. You can have similar one based on your convenience.

$ ls ~/Desktop/kernel/
total 44K
drwxr-xr-x  2 kaba kaba 4.0K Sep 23 20:04 downloads
drwxrwxr-x  2 kaba kaba 4.0K Oct  4 10:22 firmware
drwxr-xr-x 22 kaba kaba 4.0K Oct  6 11:55 kernel_out
drwxr-xr-x 18 kaba kaba 4.0K Sep 12  2013 rootfs
drwxr-xr-x 26 kaba kaba 4.0K Oct  3 21:30 rpi_kernel
drwxr-xr-x  2 kaba kaba 4.0K Oct  7 12:13 rpi_out
drwxr-xr-x  3 kaba kaba 4.0K Sep 23 19:43 toolchain
drwxr-xr-x 26 kaba kaba 4.0K Oct  3 22:04 vanila_kernel
kaba@kaba-Vostro-1550:~/Desktop/kernel
$
Directory | Purpose | ----------------|-----------------------------------------------------------| downloads | Having tarballs of rootfs and toolchain | firmware | boot directory of Raspberry Pi firmware repo | kernel_out | Output directory for Mainline kernel | rootfs | rootfs tarball extracted | rpi_kernel | Raspberry Pi kernel repo | rpi_out | Output directory for Raspberry Pi kernel | toolchain | toolchain tarball extracted | vanilla_kernel | Mainline kernel repo |

Export PATH variable to include toolchain directory.

$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/kaba/Desktop/kernel/toolchain/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$

Configure and build 64-bit Vanilla kernel.

kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ make O=../kernel_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
.
.
.
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ make -j4 O=../kernel_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Change the suffix to -j according to your machine. And wait for the build to complete.

Now build device-tree in Raspberry Pi kernel repo.

kaba@kaba-Vostro-1550:~/Desktop/kernel/rpi_kernel
$ make O=../rpi_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
make[1]: Entering directory '/home/kaba/Desktop/kernel/rpi_out'
  HOSTCC  scripts/basic/fixdep
  GEN     ./Makefile
  HOSTCC  scripts/kconfig/conf.o
  YACC    scripts/kconfig/zconf.tab.c
  LEX     scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'defconfig'
#
# configuration written to .config
#
make[1]: Leaving directory '/home/kaba/Desktop/kernel/rpi_out'
kaba@kaba-Vostro-1550:~/Desktop/kernel/rpi_kernel
$ make O=../rpi_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- dtbs
.
.
.

Partition your memory card into two. The first one should be FAT32 and second one should be EXT4. The first partition should be a boot partition.

balakumaran@balakumaran-USB:~/Desktop/RPi/linux_build$ sudo parted /dev/sdd
[sudo] password for balakumaran: 
GNU Parted 3.2
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model: MXT-USB Storage Device (scsi)
Disk /dev/sdd: 31.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  106MB   105MB   primary  fat32        boot, lba
 2      106MB   31.9GB  31.8GB  primary

(parted) help rm                                                          
  rm NUMBER                                delete partition NUMBER

        NUMBER is the partition number used by Linux.  On MS-DOS disk labels, the primary partitions number from 1 to 4, logical partitions from 5 onwards.
(parted) rm 1                                                             
(parted) rm 2                                                             
(parted) print
Model: MXT-USB Storage Device (scsi)
Disk /dev/sdd: 31.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start  End  Size  Type  File system  Flags

(parted) help mkpart
  mkpart PART-TYPE [FS-TYPE] START END     make a partition

        PART-TYPE is one of: primary, logical, extended
        FS-TYPE is one of: zfs, btrfs, nilfs2, ext4, ext3, ext2, fat32, fat16, hfsx, hfs+, hfs, jfs, swsusp, linux-swap(v1), linux-swap(v0), ntfs, reiserfs, freebsd-ufs, hp-ufs, sun-ufs,
        xfs, apfs2, apfs1, asfs, amufs5, amufs4, amufs3, amufs2, amufs1, amufs0, amufs, affs7, affs6, affs5, affs4, affs3, affs2, affs1, affs0, linux-swap, linux-swap(new), linux-swap(old)
        START and END are disk locations, such as 4GB or 10%.  Negative values count from the end of the disk.  For example, -1s specifies exactly the last sector.

        'mkpart' makes a partition without creating a new file system on the partition.  FS-TYPE may be specified to set an appropriate partition ID.
(parted) mkpart primary fat32 2048s 206848s
(parted) mkpart primary ext4 208896s -1s
(parted) print                                                            
Model: MXT-USB Storage Device (scsi)
Disk /dev/sdd: 31.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  106MB   105MB   primary  fat32        lba
 2      107MB   31.9GB  31.8GB  primary  ext4         lba

(parted) set 1 boot on                                                    
(parted) print                                                            
Model: MXT-USB Storage Device (scsi)
Disk /dev/sdd: 31.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  106MB   105MB   primary  fat32        boot, lba
 2      107MB   31.9GB  31.8GB  primary  ext4         lba

(parted) quit                                                             
Information: You may need to update /etc/fstab.

balakumaran@balakumaran-USB:~/Desktop/RPi/linux_build$ sudo fdisk -l /dev/sdd
Disk /dev/sdd: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xcde890ba

Device     Boot  Start      End  Sectors  Size Id Type
/dev/sdd1  *      2048   206848   204801  100M  c W95 FAT32 (LBA)
/dev/sdd2       208896 62333951 62125056 29.6G 83 Linux
balakumaran@balakumaran-USB:~/Desktop/RPi/linux_build$

Copy firmware and kernel to boot partition.

kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo mount /dev/sdb1 /mnt/boot/
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo cp ../kernel_out/arch/arm64/boot/Image /mnt/boot/kernel8.img
[sudo] password for kaba: 
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo cp ../firmware/* /mnt/boot/
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$

Install device-tree blobs from Raspberry Pi repo into boot partition. The device-tree in upstream kernel is not working for some reason. I couldn't get more information regarding that.

kaba@kaba-Vostro-1550:~/Desktop/kernel/rpi_kernel
$ make O=../rpi_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bcmrpi3_defconfig
kaba@kaba-Vostro-1550:~/Desktop/kernel/rpi_kernel
$ sudo make O=../rpi_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- INSTALL_PATH=/mnt/boot/ dtbs_install

Copy rootfs into second partition. Also install kernel modules into that.

kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo mount /dev/sdb2 /mnt/rootfs/
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo cp -rf ../rootfs/* /mnt/rootfs/
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sudo make O=../kernel_out ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- INSTALL_MOD_PATH=/mnt/rootfs/ modules_install
kaba@kaba-Vostro-1550:~/Desktop/kernel/vanila_kernel
$ sync

Create config.txt and cmdline.txt as follows. Make sure you update device-tree and overlay_prefix based on your configuration.

kaba@kaba-Vostro-1550:/mnt/boot
$ cat cmdline.txt 
dwc_otg.lpm_enable=0 console=serial0,115200 root=/dev/mmcblk0p2 rootfstype=ext4 rootwait    
kaba@kaba-Vostro-1550:/mnt/boot
$ cat config.txt 
dtoverlay=vc4-fkms-v3d,cma-256
disable_overscan=1
dtparam=audio=on
device_tree=dtbs/4.19.0-rc5-v8+/broadcom/bcm2710-rpi-3-b.dtb
overlay_prefix=dtbs/4.19.0-rc5-v8+/overlays/
enable_uart=1
kaba@kaba-Vostro-1550:/mnt/boot
$

Put the SD card in Raspberry Pi 3 and boot.

The Volatile keyword

Recently I've interviewed some candidates for entry and intermediate level positions. One of the questions most of them struggled is about the volatile keyword. Some conversations went like this, * Q: Why we use volatile keyword? * A: It will tell compiler not to use any registers for the volatile variable. * Q: Then how will it work in an ARM processor? In ARM no instruction other than load and store can use memory location. * A: ??!!

  • Q: What is the purpose volatile keyword?
  • A: We'll use for IO memory
  • Q: Why we need it for IO memory?
  • A: So every time processor accesses the memory, it will go to IO device
  • Q: So volatile is to tell processor not to cache data?
  • A: Yes
  • Q: Thus volatile is a processor directive not compiler directive?
  • And confusion starts

In this post, lets see how volatile works with two simple C programs. In complex programs with multiple variables and loops volatile keyword will make significant difference in speed and memory usage.

GCC provides many compiler optimization flags. Enabling them will aggressively optimize the code and give better performance in terms of speed and memory footprint. As these optimizations make debugging harder, they are not suitable development. All available GCC compiler optimization flags can be get from following command.

$ $CC --help=optimizers
The following options control optimizations:
  -O<number>                  Set optimization level to <number>.
  -Ofast                      Optimize for speed disregarding exact standards compliance.
  -Og                         Optimize for debugging experience rather than speed or size.
  -Os                         Optimize for space rather than speed.
  -faggressive-loop-optimizations Aggressively optimize loops using language constraints.
.
.
.

For simplicity, I used only -Ofast optimizer for the examples. It informs GCC to do its best to make the program run faster. We'll see how compiler builds, with and without volatile. GCC will give assembly code output with -S options.

Take the following C program.

#include <stdio.h>

int main() {
    int *x = (int *)0xc000;
    int b = *x;
    int c = *x;
    *x = b + c;
    return 0;
}
Don't worry about dereferencing a random virtual address. We are not going to run this program, just build the assembly code and examine manually. I use pointers in these programs. Because using immediate value makes no sense with volatile. We have an integer pointer x points to address 0xc000. We initialize two variables b and c with value in address 0xc000. And then addition of b and c is stored in location 0xc000. So we read the value in location 0xc000 twice in this program. Let see how it gets compiled by GCC form ARMv8.

$ echo $CC
aarch64-poky-linux-gcc --sysroot=/opt/poky/2.4.2/sysroots/aarch64-poky-linux
kaba@kaba-Vostro-1550:~/Desktop/volatile/single_varuable_two_reads
$ $CC -S -Ofast ./code.c -o code.S
kaba@kaba-Vostro-1550:~/Desktop/volatile/single_varuable_two_reads
$

    .arch armv8-a
    .file   "code.c"
    .text
    .section    .text.startup,"ax",@progbits
    .align  2
    .p2align 3,,7
    .global main
    .type   main, %function
main:
    mov x2, 49152
    mov w0, 0
    ldr w1, [x2]
    lsl w1, w1, 1
    str w1, [x2]
    ret
    .size   main, .-main
    .ident  "GCC: (GNU) 7.3.0"
    .section    .note.GNU-stack,"",@progbits
The compiler intelligently finds that variable b and c have same value from address 0xc000 and addition of them is equivalent to multiplying the value at 0xc000 by two. So it loads the value into register W1 and left shifts it by 1 (equivalent of multiplying with two) and then stores the new value into location 0xc000.

Now lets change the code to use volatile for variable x. And see how the assembly code looks.

#include <stdio.h>

int main() {
    volatile int *x = (int *)0xc000;
    int b = *x;
    int c = *x;
    *x = b + c;
    return 0;
}
    .arch armv8-a
    .file   "code.c"
    .text
    .section    .text.startup,"ax",@progbits
    .align  2
    .p2align 3,,7
    .global main
    .type   main, %function
main:
    mov x1, 49152
    mov w0, 0
    ldr w2, [x1]
    ldr w3, [x1]
    add w2, w2, w3
    str w2, [x1]
    ret
    .size   main, .-main
    .ident  "GCC: (GNU) 7.3.0"
    .section    .note.GNU-stack,"",@progbits
This time the compiler considers that the value at location 0xc000 may be different each time it reads. It thinks that the variables b and c could be initialized with different values. So it reads the location 0xc000 twice and adds both values.

Lets see a simple loop case

#include <stdio.h>

int main() {
    int *x = (int *)0xc000;
    int *y = (int *)0xd000;
    int sum = 0;
    for (int i = 0; i < *y; i++) {
        sum = sum + *x;
    }
    *x = sum;
    return 0;
}
This program initializes two pointers x and y to locations 0xc000 and 0xd000 respectively. It adds the value at x to itself as many times the value at y. Lets see how GCC sees it.
    .arch armv8-a
    .file   "code.c"
    .text
    .section    .text.startup,"ax",@progbits
    .align  2
    .p2align 3,,7
    .global main
    .type   main, %function
main:
    mov x0, 53248
    ldr w0, [x0]
    cmp w0, 0
    ble .L3
    mov x1, 49152
    ldr w1, [x1]
    mul w1, w0, w1
.L2:
    mov x2, 49152
    mov w0, 0
    str w1, [x2]
    ret
.L3:
    mov w1, 0
    b   .L2
    .size   main, .-main
    .ident  "GCC: (GNU) 7.3.0"
    .section    .note.GNU-stack,"",@progbits
The compiler assigns register X0 to y and register X1 to x. The program compares the value at [X0] - value at the address in X0 - with zero. If so, it jumps to .L3 which sets W1 to zero and jumps to .L2. Or it simply multiplies [X0] and [X1] and stores the value in W1. .L2 stores the value in W1 at [X2] and returns. The compiler intelligently identifies that adding [X2] to itself [X1] times is equivalent to multiplying both.

With volatile,

#include <stdio.h>

int main() {
    volatile int *x = (int *)0xc000;
    int *y = (int *)0xd000;
    int sum = 0;
    for (int i = 0; i < *y; i++) {
        sum = sum + *x;
    }
    *x = sum;
    return 0;
}
the corresponding assembly code is
    .arch armv8-a
    .file   "code.c"
    .text
    .section    .text.startup,"ax",@progbits
    .align  2
    .p2align 3,,7
    .global main
    .type   main, %function
main:
    mov x0, 53248
    ldr w3, [x0]
    cmp w3, 0
    ble .L4
    mov w0, 0
    mov w1, 0
    mov x4, 49152
    .p2align 2
.L3:
    ldr w2, [x4]
    add w0, w0, 1
    cmp w0, w3
    add w1, w1, w2
    bne .L3
.L2:
    mov x2, 49152
    mov w0, 0
    str w1, [x2]
    ret
.L4:
    mov w1, 0
    b   .L2
    .size   main, .-main
    .ident  "GCC: (GNU) 7.3.0"
    .section    .note.GNU-stack,"",@progbits
This time GCC uses X4 for the address 0xc000, but its not significant for our problem. Look here the loop is .L3. It loads the value at location X4 every time the loop runs, which is different than non-volatile behaviour. This time the compiler things the value at X4 will be different each time it is read. So without any assumption, it adds the value to sum every time the loop runs.

In both programs the value at the location 0xc000 can be cached by the processor. The subsequent read of the value at 0xc000 could be from processor's cache but not from main memory. It is responsibility of the memory controller to maintain coherency between memory and processor cache. The volatile keyword has nothing to do here.

I believe these simple programs had explained the concept clear. The volatile

IS
  • To tell compiler not to make any assumption about the value stored in the variable
IS NOT
  • To tell the compiler not to use any registers to hold the value
  • To tell the processor not to cache the value

Anatomy of Linux system call in ARM64

The purpose of an Operating System is to run user applications. But the OS cannot provide user applications full control due to security reasons. So to do some privileged operations applications should ask OS to do the job on behalf of themselves. The primary interaction mechanism in Linux and similar Operating Systems is System Call. In this article we are going to see the anatomy of Linux System Calls in ARM64 architecture. Most people working with applications doesn't require it. It is only for those who have a printed copy of Hitchhiker's guide to the galaxy. DON'T PANIC!

ARMv8 has four exception levels - EL0 to EL3. EL0 has lowest privilege where user applications run. EL3 has the highest privilege for Secure Monitor firmware (usually proprietary). Hypervisor runs in EL2 for virtualisation platforms. And our beloved Linux kernel runs in EL1. Elevation from one exception level to next exception level are achieved by setting exceptions. These exceptions will be set by one level and the next level will handle it. Explaining all types of exceptions is out of the scope of this article.

The instruction used to set a synchronous exception [used for system call mechanism] to elevate from EL0 to EL1 is svc - supervisor call. Thus an application runs in Linux should issue svc with registers set with appropriate values. To know what are those appropriate values, Lets see how kernel handles svc.

Kernel Part

NOTE: All the code references given in this post are from Linux-4.9.57.

Vector table definition in Kernel

As I have mentioned already, there are multiple exceptions can be set by applications [EL0] which will be taken by Kernel [EL1]. The handlers for these exceptions are stored in a vector table. In ARMv8 the register that mentions the base address of that vector table is VBAR_EL1 [Vector Base Address Register for EL1].

{% blockquote ARM infocenter %} When an exception occurs, the processor must execute handler code which corresponds to the exception. The location in memory where the handler is stored is called the exception vector. In the ARM architecture, exception vectors are stored in a table, called the exception vector table. Each Exception level has its own vector table, that is, there is one for each of EL3, EL2 and EL1. The table contains instructions to be executed, rather than a set of addresses. Vectors for individual exceptions are located at fixed offsets from the beginning of the table. The virtual address of each table base is set by the Vector Based Address Registers VBAR_EL3, VBAR_EL2 and VBAR_EL1.

As explained above the exception-handlers reside in a continuous memory and each vector spans up to 32 instructions long. Based on type of the exception, the execution will start from an instruction in a particular offset from the base address VBAR_EL1. Below is the ARM64 vector table. For example when an synchronous exception is set from EL0 is set, the handler at VBAR_EL1 +0x400 will execute to handle the exception.

Offset from VBAR_EL1 Exception type Exception set level
+0x000 Synchronous Current EL with SP0
+0x080 IRQ/vIRQ "
+0x100 FIQ/vFIQ "
+0x180 SError/vSError "
+0x200 Synchronous Current EL with SPx
+0x280 IRQ/vIRQ "
+0x300 FIQ/vFIQ "
+0x380 SError/vSError "
+0x400 Synchronous Lower EL using ARM64
+0x480 IRQ/vIRQ "
+0x500 FIQ/vFIQ "
+0x580 SError/vSError "
+0x600 Synchronous Lower EL with ARM32
+0x680 IRQ/vIRQ "
+0x700 FIQ/vFIQ "
+0x780 SError/vSError "

Linux defines the vector table at arch/arm64/kernel/entry.S +259. Each ventry is 32 instructions long. As an instruction in ARMv8 is 4 bytes long, next ventry will start at +0x80 of current ventry.

ENTRY(vectors)
    ventry    el1_sync_invalid           // Synchronous EL1t
    ventry    el1_irq_invalid            // IRQ EL1t
    ventry    el1_fiq_invalid            // FIQ EL1t
    ventry    el1_error_invalid          // Error EL1t

    ventry    el1_sync                   // Synchronous EL1h
    ventry    el1_irq                    // IRQ EL1h
    ventry    el1_fiq_invalid            // FIQ EL1h
    ventry    el1_error_invalid          // Error EL1h

    ventry    el0_sync                   // Synchronous 64-bit EL0
    ventry    el0_irq                    // IRQ 64-bit EL0
    ventry    el0_fiq_invalid            // FIQ 64-bit EL0
    ventry    el0_error_invalid          // Error 64-bit EL0

#ifdef CONFIG_COMPAT
    ventry    el0_sync_compat            // Synchronous 32-bit EL0
    ventry    el0_irq_compat             // IRQ 32-bit EL0
    ventry    el0_fiq_invalid_compat     // FIQ 32-bit EL0
    ventry    el0_error_invalid_compat   // Error 32-bit EL0
#else
    ventry    el0_sync_invalid           // Synchronous 32-bit EL0
    ventry    el0_irq_invalid            // IRQ 32-bit EL0
    ventry    el0_fiq_invalid            // FIQ 32-bit EL0
    ventry    el0_error_invalid          // Error 32-bit EL0
#endif
END(vectors)
And loads the vector table into VBAR_EL1 at arch/arm64/kernel/head.S +433.
    adr_l   x8, vectors                 // load VBAR_EL1 with virtual
    msr     vbar_el1, x8                // vector table address
    isb
VBAR_EL1 is an system register. So it cannot be accessed directly. Special system instructions msr and mrs should be used manipulate system registers.

Instruction Description
adr_l x8, vector loads the address of vector table into general purpose register X8
msr vbar_el1, x8 moves value in X8 to system register VBAR_EL1
isb instruction sync barrier

System call flow in Kernel

Now lets see what happens when an application issues the instruction svc. From the table, we can see for AArch64 synchronous exception from lower level, the offset is +0x400. In the Linux vector definition VBAR_EL1+0x400 is el0_sync. Lets go to the el0_sync definition at arch/arm64/kernel/entry.S +458

el0_sync:
    kernel_entry 0
    mrs    x25, esr_el1                     // read the syndrome register
    lsr    x24, x25, #ESR_ELx_EC_SHIFT      // exception class
    cmp    x24, #ESR_ELx_EC_SVC64           // SVC in 64-bit state
    b.eq    el0_svc
    cmp    x24, #ESR_ELx_EC_DABT_LOW        // data abort in EL0
    b.eq    el0_da
    cmp    x24, #ESR_ELx_EC_IABT_LOW        // instruction abort in EL0
    b.eq    el0_ia
    cmp    x24, #ESR_ELx_EC_FP_ASIMD        // FP/ASIMD access
    b.eq    el0_fpsimd_acc
    cmp    x24, #ESR_ELx_EC_FP_EXC64        // FP/ASIMD exception
    b.eq    el0_fpsimd_exc
    cmp    x24, #ESR_ELx_EC_SYS64           // configurable trap
    b.eq    el0_sys
    cmp    x24, #ESR_ELx_EC_SP_ALIGN        // stack alignment exception
    b.eq    el0_sp_pc
    cmp    x24, #ESR_ELx_EC_PC_ALIGN        // pc alignment exception
    b.eq    el0_sp_pc
    cmp    x24, #ESR_ELx_EC_UNKNOWN         // unknown exception in EL0
    b.eq    el0_undef
    cmp    x24, #ESR_ELx_EC_BREAKPT_LOW     // debug exception in EL0
    b.ge    el0_dbg
    b    el0_inv
The subroutine is nothing but a bunch of if conditions. The synchronous exception can have multiple reasons which will be stored in the syndrome register esr_el1. Compare the value in syndrome register with predefined macros and branch to the corresponding subroutine.

Instruction Description
kernel_entry 0 It is a macro defined at
arch/arm64/kernel/entry.S +71. It stores
all the general purpose registers into
CPU stack as the sys_* function
expects its arguments from
CPU stack only
mrs x25, esr_el1 Move system register esr_el1 to general
purpose register X25. esr_el1 is the exception
syndrome register. It will have the syndrome code
that caused the exception.
lsr x24, x25, #ESR_ELx_EC_SHIFT Left shift X25 by ESR_ELx_EC_SHIFT bits
and store the result in X24
cmp x24, #ESR_ELx_EC_SVC64 Compare the value in X24 with ESR_ELx_EC_SVC64.
If both are equal Z bit will be set in NZCV
special purpose register.
b.eq el0_svc If Z flag is set in NZCV, branch to el0_svc.
It is just b not bl. So the control will not come
back to caller. The condition check will happen until
it finds the appropriate reason. If all are wrong el0_inv
will be called.

In a system call case, control will be branched to el0_svc. It is defined at arm64/kernel/entry.S +742 as follows

/*
 * SVC handler.
 */
    .align    6
el0_svc:
    adrp    stbl, sys_call_table            // load syscall table pointer
    uxtw    scno, w8                        // syscall number in w8
    mov     sc_nr, #__NR_syscalls
el0_svc_naked:                              // compat entry point
    stp     x0, scno, [sp, #S_ORIG_X0]      // save the original x0 and syscall number    enable_dbg_and_irq
    ct_user_exit 1

    ldr     x16, [tsk, #TI_FLAGS]           // check for syscall hooks
    tst     x16, #_TIF_SYSCALL_WORK
    b.ne    __sys_trace
    cmp     scno, sc_nr                     // check upper syscall limit
    b.hs    ni_sys
    ldr     x16, [stbl, scno, lsl #3]       // address in the syscall table
    blr     x16                             // call sys_* routine
    b       ret_fast_syscall
ni_sys:
    mov     x0, sp
    bl      do_ni_syscall
    b       ret_fast_syscall
ENDPROC(el0_svc)
Before going through the code, let me introduce the aliases arch/arm64/kernel/entry.S +229
/*
 * These are the registers used in the syscall handler, and allow us to
 * have in theory up to 7 arguments to a function - x0 to x6.
 *
 * x7 is reserved for the system call number in 32-bit mode.
 */
sc_nr   .req    x25        // number of system calls
scno    .req    x26        // syscall number
stbl    .req    x27        // syscall table pointer
tsk     .req    x28        // current thread_info
Now lets walk through function el0_svc,

Instruction Description
adrp stbl, sys_call_table I'll come to the sys_call_table in next section.
It is the table indexed with syscall number and
corresponding function. It has to be in a
4K aligned memory. Thus this instruction adds the top
22-bits of sys_call_table address with top 52-bit of
PC (program counter) and stores the value at stbl.
Actually it forms the PC-relative address to 4KB page.
uxtw scno, w8 unsigned extract from 32-bit word.
Read 32-bit form of General purpose register X8
and store it in scno
mov sc_nr, #__NR_syscalls Load sc_nr with number of system calls
stp x0, scno, [sp, #S_ORIG_X0] Store a pair of registers X0 and scno
at the memory location stack-pointer + S_ORIG_X0.
Value of S_ORIG_X0 is not really important.
As long as stack-pointer is not modified,
we can access the stored values anytime.
enable_dbg_and_irq it is a macro defined at
arch/arm64/include/asm/assembler.h +88.
It actually enables IRQ and
debugging by setting appropriate
value at special purpose register DAIF.
ct_user_exit 1 another macro not much important
unless you bother about CONFIG_CONTEXT_TRACKING
ldr x16, [tsk, #TI_FLAGS]
tst x16, #_TIF_SYSCALL_WORK
b.ne __sys_trace
These instructions are related to syscall hooks.
If syscall hooks are set, call __sys_trace.
For those who got confused about b.ne like me,
.ne only check whether Z flag is non zero.
tst instruction does an bitwise AND of both
operands. If both are equal, Z flag will be non zero.
cmp scno, sc_nr
b.hs ni_sys
Just an error check. If the
syscall number is greater than
sc_nr go to ni_sys
ldr x16, [stbl, scno, lsl #3] Load the address of corresponding
sys_* function into X16.
Will explain detail in next section
blr x16 subroutine call to the actual sys_*
function
b ret_fast_syscall Maybe a house-keeping function.
Control will not flow further down.

sys_call_table

It is nothing but an array of function pointer indexed with the system call number. It has to be placed in an 4K aligned memory. For ARM64 sys_call_table is defined at arch/arm64/kernel/sys.c +55.

#undef __SYSCALL
#define __SYSCALL(nr, sym)  [nr] = sym,

/*
 * The sys_call_table array must be 4K aligned to be accessible from
 * kernel/entry.S.
 */
void * const sys_call_table[__NR_syscalls] __aligned(4096) = {
    [0 ... __NR_syscalls - 1] = sys_ni_syscall,
#include <asm/unistd.h>
};
* __NR_syscalls defines the number of system call. This varies from architecture to architecture. * Initially all the system call numbers were set sys_ni_syscall - not implemented system call. If a system call is removed, its system call number will not be reused. Instead it will be assigned with sys_ni_syscall function. * And the include goes like this arch/arm64/include/asm/unistd.h -> arch/arm64/include/uapi/asm/unistd.h -> include/asm-generic/unistd.h -> include/uapi/asm-generic/unistd.h. The last file has the definition of all system calls. For example the write system call is defined here as
#define __NR_write 64
__SYSCALL(__NR_write, sys_write)
* The sys_call_table is an array of function pointers. As in ARM64 a function pointer is 8 bytes long, to calculate the address of actual system call, system call number scno is left shifted by 3 and added with system call table address stbl in the el0_svc subroutine - ldr x16, [stbl, scno, lsl #3]

System call definition

Each system call is defined with a macro SYSCALL_DEFINEn macro. n is corresponding to the number of arguments the system call accepts. For example the write is implemented at fs/read_write.c +599

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
        size_t, count)
{
    struct fd f = fdget_pos(fd);
    ssize_t ret = -EBADF;

    if (f.file) {
        loff_t pos = file_pos_read(f.file);
        ret = vfs_write(f.file, buf, count, &pos);
        if (ret >= 0)
            file_pos_write(f.file, pos);
        fdput_pos(f);
    }

    return ret;
}
This macro will expand into sys_write function definition and other aliases functions as mentioned in this LWN article. The expanded function will have the compiler directive asmlinkage set. It instructs the compiler to look for arguments in CPU stack instead of registers. This is to implement system calls architecture independent. That's why kernel_entry macro in el0_sync pushed all general purpose registers into stack. In ARM64 case registers X0 to X7 will have the arguments.

Application part

An user application should do following steps to make a system call * Set the lower 32-bit of general purpose register X8 with appropriate system call number - el0_svc loads the system call number from W0 * And then issue the svc instruction * Now kernel implemented sys_* function will run on behalf of the application

Normally all the heavy lifting will be done by the glibc library. Dependency on glibc also makes the application portable across platforms having glibc support. Each platform will have different system call number and glibc takes care of that while compiling.

printf implementation

In this section we'll see how glibc implements the printf system call. Going deep into all glibc macro expansion is out of scope of this post. End of the day printf has to be a write towards the stdout file. Lets see that.

Wrote a simple hello world program.

kaba@kaba-Vostro-1550:~/Desktop/workbench/code
$ cat syscall.c
#include <stdio.h>

int main()
{
    printf("Hello world!\n");
    return 0;
}
kaba@kaba-Vostro-1550:~/Desktop/workbench/code
$ 
Compile it with symbols and export it to an ARM64 target [Raspberry Pi 3 in my case]. Run it in the target with gdbserver and connect remote GDB as mentioned in previous post.

The last function in glibc that issues svc is __GI___libc_write. I got this with the help of GDB. If you really want to go through glibc code to trace this function all the way from printf, prepare yourself. Here is the GDB backtrace output.

(gdb) bt
#0  __GI___libc_write (fd=1, buf=0x412260, nbytes=13) at /usr/src/debug/glibc/2.26-r0/git/sysdeps/unix/sysv/linux/write.c:26
#1  0x0000007fb7eeddcc in _IO_new_file_write (f=0x7fb7fcd540 <_IO_2_1_stdout_>, data=0x412260, n=13) at /usr/src/debug/glibc/2.26-r0/git/libio/fileops.c:1255
#2  0x0000007fb7eed160 in new_do_write (fp=0x7fb7fcd540 <_IO_2_1_stdout_>, data=0x412260 "Hello world!\n", to_do=to_do@entry=13) at /usr/src/debug/glibc/2.26-r0/git/libio/fileops.c:510
#3  0x0000007fb7eeefc4 in _IO_new_do_write (fp=fp@entry=0x7fb7fcd540 <_IO_2_1_stdout_>, data=<optimized out>, to_do=13) at /usr/src/debug/glibc/2.26-r0/git/libio/fileops.c:486
#4  0x0000007fb7eef3f0 in _IO_new_file_overflow (f=0x7fb7fcd540 <_IO_2_1_stdout_>, ch=10) at /usr/src/debug/glibc/2.26-r0/git/libio/fileops.c:851
#5  0x0000007fb7ee3d78 in _IO_puts (str=0x400638 "Hello world!") at /usr/src/debug/glibc/2.26-r0/git/libio/ioputs.c:41
#6  0x0000000000400578 in main () at syscall.c:5
Lets us see the assembly instructions of __GI_libc_write function in GDB TUI.
   |0x7fb7f407a8 <__GI___libc_write>        stp    x29, x30, [sp, #-48]!
  >│0x7fb7f407ac <__GI___libc_write+4>      adrp   x3, 0x7fb7fd1000 <__libc_pthread_functions+184>
   │0x7fb7f407b0 <__GI___libc_write+8>      mov    x29, sp
   │0x7fb7f407b4 <__GI___libc_write+12>     str    x19, [sp, #16]
   │0x7fb7f407b8 <__GI___libc_write+16>     sxtw   x19, w0
   │0x7fb7f407bc <__GI___libc_write+20>     ldr    w0, [x3, #264]
   │0x7fb7f407c0 <__GI___libc_write+24>     cbnz   w0, 0x7fb7f407f0 <__GI___libc_write+72>
   │0x7fb7f407c4 <__GI___libc_write+28>     mov    x0, x19
   │0x7fb7f407c8 <__GI___libc_write+32>     mov    x8, #0x40                       // #64
   │0x7fb7f407cc <__GI___libc_write+36>     svc    #0x0
   .
   .
   .

Instruction Description
stp x29, x30, [sp, #-48]! Increment stack pointer and back-up
frame-pointer and link-register
adrp x3, 0x7fb7fd1000 Load the PC related address
of SINGLE_THREAD_P
mov x29, sp Move current stack-pointer
into frame-pointer
str x19, [sp, #16] Backup X19 in stack
sxtw x19, w0 Backup W0 into X19
the first parameter
This function has 3 parameters
So they will be in X0, X1
andX2. Thus they have to be
backed-up before using
ldr w0, [x3, #264] Load the global into W0
This global tells whether
it is a multi-threaded
program
cbnz w0, 0x7fb7f407f0 Conditional branch if W0
is non zero
Actually a jump in case of
multi-threaded program.
Our case is single threaded.
So fall through
mov x0, x19 Restore X19 into X0.
The first argument.
Here the fd
mov x8, #0x40 Load X8 with the value 0x40.
Kernel will look
for the system call number
at X8. So we load it with 64
which is the system call
number of write
svc #0x0 All set. Now
supervisor call

From this point the almighty Linux kernel takes control. Pretty long article ahh!. At last the improbable drive comes to equilibrium as you complete this.

References

  • http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch09s01s01.html
  • http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDEEDDC.html
  • https://lwn.net/Articles/604287/
  • https://courses.cs.washington.edu/courses/cse469/18wi/Materials/arm64.pdf
  • http://www.osteras.info/personal/2013/10/11/hello-world-analysis.html
  • And the Linux kernel source. Thanks to https://elixir.bootlin.com