From September 2019 to August 2020, I have one year of research experience in Computer Systems following Professor Yeh-Ching Chung. It is one of the most essential experiences during my undergraduate life that changes my future career.
The First Semester: Reading Papers
Professor Chung is my CSC3050 Computer Organization and Architecture instructor and the “only” researcher on Computer Systems at my university. At the end of my sophomore year, I have been doing algorithm research for around 3 months, which is not appealing and too tricky for me. In the summer of 2019, I took the Operating Systems course CS162 at UC Berkeley, and I am fascinated by it. Therefore, I would like to have a try on Computer Systems research.
In the whole semester, the only thing I did is reading papers and gave a presentation weekly. Professor Chung did not set any specific topics for me and let me explore by myself. My read list is very comprehensive, from the application level Cloud Computing to the kernel level Storage Systems. By reading, I learned the ability to scan new papers and comprehend novel knowledge quickly, which helps me significantly when I search for my future supervisor.
When I became more fluent with paper reading and presenting, I needed less time for the weekly report each week. I would need about ten hours the first time doing it, while at the end of the semester, I may need two hours to understand the central part of a paper and another two hours to prepare a presentation. If you would like to try research, my suggestion is to start as early as possible and paying effort in practice.
Meanwhile, I was still doing my algorithm research for ridesharing but with little progress. I tried to understand all those dull math papers on non-convex optimization and NL-hard relaxation. Still, math guys seemed to be reluctant to share their codes nor other supplemental materials. Although I came up with an algorithm with a boundary, such a loose boundary seemed useless in practice, and I felt depressed on algorithm research. Finally, I gave up this research in the spring of 2020.
Find Research Problem From Programming Experience
At the beginning of the second semester, Professor Chung asked me to propose a novel topic and start my research. Although I had read many papers, I felt that the more I saw, the more others had done, and the fewer space I could further explore.
At that time, I thought distributed storage and database was an exciting area because I learned Raft and Paxos. Such an algorithm is beautiful and clear to understand. Then, I learned about Redis, a popular distributed Key-Value database. Maybe I could have some research on Redis, I thought.
Memory Swap is Dangerous
The experience of USTF on a database course (ERG3010) reminded me of the importance of memory and the danger of swap. Being the USTF, I prepared a MySQL server for students to practice and build their projects. Without external funding, I rent the cheapest server with 1 CPU Core and 1 GB memory, and the disaster came right before the final presentation when the server experienced a heavy workload. The connection became extremely slow. However, there seemed to be no external attack from the dashboard. Finally, I found the problem came from the limited memory. Because I did not set a memory limit on the server, it consumed too much memory. The OS was, unfortunately, in the memory swapping process, making the whole system unavailable. From this experience, I learned an essential lesson of limiting memory usage.
Data Storage With Expiration
Another experience is when I build applications with user management systems. Usually, we need to store the user session information with expiration. However, there seems to be no standard framework or algorithm to deal with data storage with expiration. Moreover, since the memory resource is limited and critical, it is worth improving the existing expiration algorithm.
One Complete Research Process in Three-Months
With the experience I discuss above, I propose the research topic to improve the eviction algorithm for data with expiration. I chose Redis as the experimental target. In the following weeks, I first learned the detailed mechanism of Redis and its implementation detail, especially about the eviction process. If you are interested, you may refer to the list of references at the end.
Algorithm Implementation in ANSI C
By reading articles and documents for one week, I understood the Redis method on eviction. Then, I cloned the Redis 5.0 on GitHub and compiled it on my server. Next week, I implemented my algorithm, used GDB to debug, and passed all built-in tests. This was a tiring but fruitful process, and that’s why I enjoy system programming.
Test Benchmark Using YCSB
The next step is to test my algorithm. Still, there was no such benchmark for data with expiration on YCSB, so I read YCSB’s source code and learned to implement my benchmark on YCSB. It took me one more week to debug YCSB on Java. (PS: I dislike Java and seldom write Java, but I can pick up Java in one day.)
I spent the next week designing and doing experiments with Python and shell scripts. Fortunately, I did not have too much trouble in this stage since I was experienced in Linux. When I found some bugs in the result, I checked back to see if the Redis, YCSB, or the experiment itself was wrong.
The final step was writing the paper in the next month. I wrote it using Latex on my PC and suffered a lot in the environmental settings. There were too many figures, making me difficult to migrate my tex project to other templates or platforms. If I could rewrite the paper, I would definitely use Overleaf.
Last but not least, I found the MEMSYS Conference, and the deadline was in around 5 days. I was fortunate to know it 5 days before the deadline and submitted my paper to it. After 2 months, my work was accepted, and no rejection in the feedback. I revised my writing, and it was published in 2021.
Personally speaking, my previous work was not impactful, but this topic was interesting. If you would like to have a look, here is the link on ACM Digit Library: https://dl.acm.org/doi/10.1145/3422575.3422797
List of useful websites:
What I Gain?
Personally speaking, my previous work was not excellent, but I have learned the whole research process from it. I am no longer afraid of reading and writing papers and doing research, even in unfamiliar areas. Indeed, it gives me the confidence to pursue a Ph. D. degree in Computer Systems.
I have one paper published on ACM Digit Library in this experience, and it really benefited my graduate school application. Since I am the first author and only have two authors, it proves my research capability in Computer Systems. Most professors are likely to provide me an interview chance seeing my resume and transcript, except the US top programs.
By the way, here is a list of essential tools for systems research:
- GDB (Tutorial)
- Git (Tutorial)
- Visual Studio Code (for coding; my favorite)
- Maven (for Java YCSB)
- Google Scholar
- Grammarly (for English Writing)
Last but not least, you are welcome to join our study group of CS: APP, an excellent course by CMU for systems programming. If you would like to buy some second-hand programming books, you can refer to this post or contact me ASAP.