Name: Abhishek Dubey
Project Name: Optimizing the Pre-dump Algorithm
Organization: CRIU
Mentors:
Commit list (working repo): Github link
Official Documentation: Wiki page of product
git clone git@github.com:dubeyabhishek/criu.git
cd criu/
make -j8
#run complete zdtm test-suite
sudo test/zdtm.py run -a --pre=<pre_dump_count> --ignore-taint --keep-going --pre-dump-mode=read
#run specific zdtm test
sudo python test/zdtm.py run --pre=<pre_dump_count> -t zdtm/<test_case_dir>/<test_case_name> --pre-dump-mode=read
I was involved in GSoC’19 with CRIU organization working on the project - Optimizing Pre-dump algorithm. The idea behind this project was to reduce memory pressure and frozen time of target process while pre-dumping.
In current implementation of pre-dump in CRIU, first the memory content(pages) of target process
are stored in pipes. Later the pipes are flushed to
disk images or page-server. Pipes are backed by pipe-buffer pages to store data. The pipe-buffer pages are pinned to memory making them non swappable.
When the count of pipe-buffer pages is high, the system is left with limited swappable pages to serve new memory requests. This incurs memory pressure
on overall system and may thrash hampering the performance. So, handling memory pressure is first issue.
Another issue is duration for which current pre-dump algorithm keeps the target process frozen.
The parasite(blob injected in target process) in
current implementation drains the memory pages of target process into pipes and the target process remains frozen unless all pages are not drained. The
longer the frozen time, longer the duration for which pipe pages will be pinned to memory leading to memory pressure situation. Another use case which
desires shortest frozen time is live migration. So, another objective is to reduce frozen time of target process while pre-dumping.
The optimized implementation must solve above mentioned two issues. In new pre-dump approach, the target process is frozen only till memory mappings are
collected. Then the process is unfrozen and it continues normally. Now start draining of process memory. We use
process_vm_readv
syscall to drain pages to user-space buffer. The syscall is given memory mappings(collected earlier) as input to perform this task. Since draining of
memory pages and process execution happen simultaneously, there is a possibility that the running process might have modified some memory mappings after
they have been collected by pre-dump.
In such case process_vm_readv will encounter the old mappings and fails. This gives rise to a race between pre-dumping and process execution. This race
needs to be handled on the fly for process_vm_readv to successfully drain complete memory.
Patch # | Title | Status |
---|---|---|
[PATCH 0/7] | GSoC 19: Optimizing the Pre-dump Algorithm | Submitted |
[PATCH 1/7] | Adding --pre-dump-mode option | Submitted |
[PATCH 2/7] | Skip generating iov for non-PROT_READ memory | Submitted |
[PATCH 3/7] | Skip adding PROT_READ to non-PROT_READ mappings | Submitted |
[PATCH 4/7] | Adding cnt_sub for stats manipulation | Submitted |
[PATCH 5/7] | Handle vmsplice failure for read mode pre-dump | Submitted |
[PATCH 6/7] | The read mode pre-dump implementation | Submitted |
[PATCH 7/7] | Refactor time accounting macros | Submitted |
[PATCH 8/7] | Added --pre-dump-mode to libcriu | Submitted |
Handling the processing of iovs corresponding to modified memory mappings is most interesting issue of this project. Detailed discussion can be found here.
test-config: 1GB
# python test/zdtm.py run --pre <count> -t zdtm/static/maps04 --pre-dump-mode=<read/splice>
Pre-dump # | splice (original) | read (optimized) |
---|---|---|
1 | 0.59* | 0.66* |
2 | 0.06 | 0.06 |
3 | 0.07 | 0.06 |
4 | 0.06 | 0.06 |
5 | 0.06 | 0.06 |
Total | 0.84 | 0.90 |
Average performance drop : ~7%
* First pre-dump in sequence have ~13% performance drop.
test-config: 1 GB
Pre-dump # | splice (original) | read (optimized) |
---|---|---|
1 | 125.13* | 82.93* |
2 | 76.28 | 65.94 |
3 | 78.05 | 69.52 |
4 | 77.80 | 69.58 |
5 | 74.61 | 65.31 |
Total | 431.87 | 353.28 |
Average frozen time reduced : ~18%
* First pre-dump in sequence is ~35% faster.
The new implementation reduces memory pressure on system, as pipes don't hog memory for longer duration.
I am thankful to my mentors Pavel, Andrei and Mike for their guidance throughout the project. It was a great learning experience while working with them. I got chance to deep dive into memory draining part of CRIU and it was fun. Special thanks to Radostin for his prompt help and feedback. I would love to keep contributing to this wonderful community, called CRIU.
Overall, this year’s GSoC was a end-to-end learning experience for me and now I know "Open source" better than ever.