Abstract
Mass-storage secure portable tokens are emerging and provide a real breakthrough in the management of sensitive data. They can embed personal data and/or metadata referencing documents stored encrypted in the Cloud and can manage them under holder’s control. Mass on-board storage requires efficient embedded database techniques. These techniques are however very challenging to design due to a combination of conflicting NAND Flash constraints and scarce RAM constraint, disqualifying known state of the art solutions. To tackle this challenge, we proposes a log-only based storage organization and an appropriate indexing scheme, which (1) produce only sequential writes compatible with the Flash constraints and (2) consume a tiny amount of RAM, independent of the database size. We show the effectiveness of this approach through a comprehensive performance study.
Similar content being viewed by others
Notes
A recent Microsoft survey states that “58 percent of the public and 86 percent of business leaders are excited about the possibilities of cloud computing. But more than 90 percent of them are worried about security and privacy of their data as it rests in the cloud” http://news.cnet.com/8301-1009_3-10437844-83.html.
Bloc nested loop Join is often the only Join algorithm provided in embedded DBMS products (e.g., for SQLite see http://www.sqlite.org/optoverview.html).
This is not the case in high-end SSDs which can use relatively large RAM (e.g., 16 MB) to handle those constraints.
Tests on 20 recent SD cards have shown that random writes are in average 1300 times more costly than sequential writes (min 130×, max 5350×) [30].
A first layer (the Hardware Adaptation Level) of the controller software manages Low Level Drivers (LLD), Error Correction (ECC) and Bad Block Management (BBM). The second layer is the FTL, and it can be bypassed on most platforms.
Moreover, the Flash Translation Layer becomes useless (thereby saving translation costs) and the garbage collection and wear leveling mechanism can be greatly simplified.
While the strategy for handling deletes and updates is rather simple, the details on query compensation is a bit tricky and cannot be included in the paper due to size constraint.
For the sake of clarity, we make here the assumption that at most one join path exists between two tables (e.g., a snowflake schema). The indexing scheme can be extended trivially to the multiple paths case.
For example, the false positive rate using 4 hash functions and allocating 16 bits per value is 0.24 % [10]. Hence, Bloom Filters provide a very flexible way to trade space with performance.
The result tuple identifiers are produced in reversed order, from the most recently inserted to the least recent inserted, which is suitable with the definition of Tselect Index \(\mathrm{I}_{\mathrm{T}_{j}.a\to \mathrm{T}_{i}}\) given in Sect. 4.1.
Since TJoin indexes behave as normal tables, they are handled in the same way by the Merge operations, and thus, are not discussed here.
The work required to recover from a crash during the reorganization can be minimized by storing on Flash the current state of each operation and analyzing this log at recovery time.
Actually, it is worth managing a very small buffer (e.g., 1 page) in RAM to buffer several insertions of the same transaction.
This calibration is important to take into account aspects that cannot be captured by the simulator (e.g., synchronizations problems when accessing the Flash memory). It impacts negatively the performance shown here roughly by a factor of 1.4.
Note that giving a value of p for simple Flash devices like SD cards is difficult since FTL code is proprietary. It is however necessarily rather large because of their reduced cache capabilities. This is confirmed by the ratio between sequential and random writes (between 130 and 5350! [30])
References
Agrawal, D., Abbadi, A.E., Wang, S.: Secure data management in the cloud. In: DNIS (2011)
Agrawal, D., Ganesan, D., Sitaraman, R., Diao, Y., Singh, S.: Lazy-adaptive tree: an optimized index structure for flash devices. In: PVLDB (2009)
Allard, T., Anciaux, N., Bouganim, L., Guo, Y., Le Folgoc, L., Nguyen, B., Pucheral, P., Ray, I., Ray, I., Yin, S.: Secure personal data servers: a vision paper. In: PVLDB (2010)
Allard, T., Anciaux, N., Bouganim, L., Pucheral, P., Thion, R.: Trustworthiness of pervasive healthcare folders. In: Pervasive and Smart Technologies for Healthcare, Information Science Reference (2009)
Anciaux, N., Benzine, M., Bouganim, L., Pucheral, P., Shasha, D.: Revelation on demand. In: DAPD (2009)
Anciaux, N., Bouganim, L., Guo, Y., Pucheral, P., Vandewalle, J.J., Yin, S.: Pluggable personal data servers. In: SIGMOD (2010)
Arge, L.: The buffer tree: a technique for designing batched external data structures. Algorithmica (2003)
Bernstein, P., Reid, C., Das, S.: Hyder—a transactional record manager for shared flash. In: CIDR (2011)
Bityutskiy, A.B.: JFFS3 design issues. Tech. report (2005)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM (1970)
Bolchini, C., Salice, F., Schreiber, F., Tanca, L.: Logical and physical design issues for smart card databases. In: TOIS (2003)
Bursky, D.: Secure microcontrollers keep data safe. PRN engineering services (2012). http://tinyurl.com/secureMCU
Chan, C.Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: SIGMOD (1999)
Debnath, B., Sengupta, S., Li, J.: SkimpyStash: RAM space skimpy key-value store on flash. In: SIGMOD (2011)
Elbaz, R., Champagne, D., Lee, R.B., Torres, L., Sassatelli, G., Guillemin, P.: TEC-tree: a low-cost, parallelizable tree for efficient defense against memory replay attacks. In: CHES (2007)
Eurosmart: Smart USB token. White paper (2008)
Gemmell, J., Bell, G., Lueder, R.: MyLifeBits: a personal database for everything. Commun. ACM 49(1) (2006)
Giesecke devrient: portable security token. http://www.gd-sfs.com/portable-security-token
Haas, L.M., Carey, M.J., Livny, M., Shukla, A.: Seeking the truth about ad hoc join costs. VLDB J. (1997)
Bonnet, P., Bouganim, L., Koltsidas, I., Viglas, S.D.: System co-design and date management for flash devices. In: PVLDB (2011)
Li, Y., He, B., Yang, R.J., Luo, Q., Yi, K.: Tree indexing on solid state drives. In: PVLDB (2010)
Li, Z., Ross, K.A.: Fast joins using join indices. VLDB J. (1999)
Lim, H., Fan, B., Andersen, D., Kaminsky, M.: SILT: a memory-efficient, high-performance key-value store. In: SOSP (2011)
Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A., Rivest, R.L.: Handbook of Applied Cryptography. CRC Press, Boca Raton (2001)
Moglen, E.: FreedomBox. http://freedomboxfoundation.org
Muth, P., O’Neil, P., Pick, A., Weikum, G.: The LHAM log-structured history data access method. VLDB J. (2000)
O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). Acta Inform. (1996)
Pucheral, P., Bouganim, L., Valduriez, P., Bobineau, C.: PicoDBMS: scaling down database techniques for the smart card. VLDB J. (2001)
Rosenblum, M., Ousterhout, J.: The design and implementation of a log-structured file system. ACM Trans. Comput. Sci. (1992)
Schmid, P., Roos, A.: SDXC/SDHC memory cards, rounded up and benchmarked. http://tinyurl.com/tom-sdxc
Severance, D., Lohman, G.: Differential files: their application to the maintenance of large databases. ACM Trans. Database Syst. (1976)
Sundaresan, P.: General key indexes. US Patent No. 5870747 (1999)
Vo, H.T., Wang, S., Agrawal, D., Chen, G., Ooi, B.C.: LogBase: scalable log-structured storage system for write-heavy environments. Technical report (2012)
Weininger, A.: Efficient execution of joins in a star schema. In: SIGMOD (2002)
Wu, C., Chang, L., Kuo, T.: An efficient b-tree layer for flash-memory storage systems. In: RTCSA (2003)
Yin, S., Pucheral, P., Meng, X.: A sequential indexing scheme for flash-based embedded systems. In: EDBT (2009)
Acknowledgements
This work has been partially funded by the French ANR KISS project under grant No. ANR-11-INSE-0005. The authors also wish to thank Philippe Bonnet for his accurate comments on early versions of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Elena Ferrari.
Rights and permissions
About this article
Cite this article
Anciaux, N., Bouganim, L., Pucheral, P. et al. MILo-DB: a personal, secure and portable database machine. Distrib Parallel Databases 32, 37–63 (2014). https://doi.org/10.1007/s10619-012-7119-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7119-x