The Volatile keyword
Recently I've interviewed some candidates for entry and intermediate level positions. One of the questions most of them struggled is about the volatile
keyword. Some conversations went like this,
* Q: Why we use volatile
keyword?
* A: It will tell compiler not to use any registers for the volatile
variable.
* Q: Then how will it work in an ARM processor? In ARM no instruction other than load and store can use memory location.
* A: ??!!
- Q: What is the purpose
volatile
keyword? - A: We'll use for IO memory
- Q: Why we need it for IO memory?
- A: So every time processor accesses the memory, it will go to IO device
- Q: So
volatile
is to tell processor not to cache data? - A: Yes
- Q: Thus
volatile
is a processor directive not compiler directive? - And confusion starts
In this post, lets see how volatile
works with two simple C programs. In complex programs with multiple variables and loops volatile
keyword will make significant difference in speed and memory usage.
GCC provides many compiler optimization flags. Enabling them will aggressively optimize the code and give better performance in terms of speed and memory footprint. As these optimizations make debugging harder, they are not suitable development. All available GCC compiler optimization flags can be get from following command.
$ $CC --help=optimizers
The following options control optimizations:
-O<number> Set optimization level to <number>.
-Ofast Optimize for speed disregarding exact standards compliance.
-Og Optimize for debugging experience rather than speed or size.
-Os Optimize for space rather than speed.
-faggressive-loop-optimizations Aggressively optimize loops using language constraints.
.
.
.
For simplicity, I used only -Ofast
optimizer for the examples. It informs GCC to do its best to make the program run faster. We'll see how compiler builds, with and without volatile
. GCC will give assembly code output with -S
options.
Take the following C program.
#include <stdio.h>
int main() {
int *x = (int *)0xc000;
int b = *x;
int c = *x;
*x = b + c;
return 0;
}
x
points to address 0xc000
. We initialize two variables b
and c
with value in address 0xc000
. And then addition of b
and c
is stored in location 0xc000
. So we read the value in location 0xc000
twice in this program. Let see how it gets compiled by GCC form ARMv8.
$ echo $CC
aarch64-poky-linux-gcc --sysroot=/opt/poky/2.4.2/sysroots/aarch64-poky-linux
kaba@kaba-Vostro-1550:~/Desktop/volatile/single_varuable_two_reads
$ $CC -S -Ofast ./code.c -o code.S
kaba@kaba-Vostro-1550:~/Desktop/volatile/single_varuable_two_reads
$
.arch armv8-a
.file "code.c"
.text
.section .text.startup,"ax",@progbits
.align 2
.p2align 3,,7
.global main
.type main, %function
main:
mov x2, 49152
mov w0, 0
ldr w1, [x2]
lsl w1, w1, 1
str w1, [x2]
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",@progbits
b
and c
have same value from address 0xc000
and addition of them is equivalent to multiplying the value at 0xc000
by two. So it loads the value into register W1
and left shifts it by 1 (equivalent of multiplying with two) and then stores the new value into location 0xc000
.
Now lets change the code to use volatile
for variable x
. And see how the assembly code looks.
#include <stdio.h>
int main() {
volatile int *x = (int *)0xc000;
int b = *x;
int c = *x;
*x = b + c;
return 0;
}
.arch armv8-a
.file "code.c"
.text
.section .text.startup,"ax",@progbits
.align 2
.p2align 3,,7
.global main
.type main, %function
main:
mov x1, 49152
mov w0, 0
ldr w2, [x1]
ldr w3, [x1]
add w2, w2, w3
str w2, [x1]
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",@progbits
0xc000
may be different each time it reads. It thinks that the variables b
and c
could be initialized with different values. So it reads the location 0xc000
twice and adds both values.
Lets see a simple loop case
#include <stdio.h>
int main() {
int *x = (int *)0xc000;
int *y = (int *)0xd000;
int sum = 0;
for (int i = 0; i < *y; i++) {
sum = sum + *x;
}
*x = sum;
return 0;
}
x
and y
to locations 0xc000
and 0xd000
respectively. It adds the value at x
to itself as many times the value at y
. Lets see how GCC sees it.
.arch armv8-a
.file "code.c"
.text
.section .text.startup,"ax",@progbits
.align 2
.p2align 3,,7
.global main
.type main, %function
main:
mov x0, 53248
ldr w0, [x0]
cmp w0, 0
ble .L3
mov x1, 49152
ldr w1, [x1]
mul w1, w0, w1
.L2:
mov x2, 49152
mov w0, 0
str w1, [x2]
ret
.L3:
mov w1, 0
b .L2
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",@progbits
X0
to y
and register X1
to x
. The program compares the value at [X0]
- value at the address in X0
- with zero. If so, it jumps to .L3
which sets W1
to zero and jumps to .L2
. Or it simply multiplies [X0]
and [X1]
and stores the value in W1
. .L2
stores the value in W1
at [X2]
and returns. The compiler intelligently identifies that adding [X2]
to itself [X1]
times is equivalent to multiplying both.
With volatile
,
#include <stdio.h>
int main() {
volatile int *x = (int *)0xc000;
int *y = (int *)0xd000;
int sum = 0;
for (int i = 0; i < *y; i++) {
sum = sum + *x;
}
*x = sum;
return 0;
}
.arch armv8-a
.file "code.c"
.text
.section .text.startup,"ax",@progbits
.align 2
.p2align 3,,7
.global main
.type main, %function
main:
mov x0, 53248
ldr w3, [x0]
cmp w3, 0
ble .L4
mov w0, 0
mov w1, 0
mov x4, 49152
.p2align 2
.L3:
ldr w2, [x4]
add w0, w0, 1
cmp w0, w3
add w1, w1, w2
bne .L3
.L2:
mov x2, 49152
mov w0, 0
str w1, [x2]
ret
.L4:
mov w1, 0
b .L2
.size main, .-main
.ident "GCC: (GNU) 7.3.0"
.section .note.GNU-stack,"",@progbits
X4
for the address 0xc000
, but its not significant for our problem. Look here the loop is .L3
. It loads the value at location X4
every time the loop runs, which is different than non-volatile behaviour. This time the compiler things the value at X4
will be different each time it is read. So without any assumption, it adds the value to sum
every time the loop runs.
In both programs the value at the location 0xc000
can be cached by the processor. The subsequent read of the value at 0xc000
could be from processor's cache but not from main memory. It is responsibility of the memory controller to maintain coherency between memory and processor cache. The volatile
keyword has nothing to do here.
I believe these simple programs had explained the concept clear. The volatile
IS
- To tell compiler not to make any assumption about the value stored in the variable
IS NOT
- To tell the compiler not to use any registers to hold the value
- To tell the processor not to cache the value