Scylla University: New Lessons for February 2021
by Guy Shtub
In my previous blog post, I wrote about the top students for 2020, the Scylla Summit Training Day, getting course completion certificates, and other news. In this blog post I’ll talk about new lessons added to Scylla University since our June 2020 update.
New CDC lesson: Change Data Capture, or CDC, is a feature that allows you to query recent changes made to data in specified tables. CDC enables users to build streaming data pipelines that enable real-time data processing and analysis and immediately react to modifications occurring in the database. Some of the topics covered in this lesson are:
- An overview of Change Data Capture, what exactly is it, what are some common use cases, what does it do, and an overview of how it works
How can that data be consumed? Different options for consuming the data changes including normal CQL, a layered approach, and integrators
- How does CDC work under the hood? Covers an example of what happens in the DB on different operations to allow CDC
- A summary of CDC: It’s easy to integrate and consume, it uses plain CQL tables, it’s robust, it’s replicated in the same way as regular data, it has a reasonable overhead, it does not overflow if the consumer fails to act and data is TTL’ed. The summary also includes a comparison with other NoSQL CDC implementations in Cassandra, DynamoDB, and MongoDB.
Workload Prioritization: using Service Level CQL commands database administrators working on Scylla Enterprise can set different workload prioritization (level of service) for each of these workloads without sacrificing latency or throughput. Each service level can also be attached to your organization’s various roles, ensuring that each role is granted the level of service they require.
Materialized Views + Secondary Indexes: Includes nine topics, three quizzes, MV lab, and SI lab. In Scylla (and Apache Cassandra), data is divided into partitions, rows, and values, which can be identified by a partition key. Sometimes the application needs to find a value by the value of another column. Doing this efficiently without scanning all of the partitions requires indexing, the focus of this lesson. There are three indexing options available in Scylla: Materialized Views, Global Secondary Indexes, and Local Secondary Indexes. They are all covered in this lesson, along with comparing them, examples of when to use each, quizzes, and hands-on labs.
New LWT lesson: There are cases when it is necessary to modify data based on its current state: that is, to perform an update that is executed only if a row does not exist or contains a particular value. Lightweight Transactions (LWTs) provide this functionality by only allowing data changes to occur if the condition provided evaluates as true. The conditional statements provide linearizable semantics, thus allowing data to remain consistent. A basic rule of thumb is that any statement with an IF clause is a conditional statement. A batch that has at least one conditional statement is a conditional batch. Conditional statements and conditional batches are executed atomically as a LWT. This lesson provides an overview of LWT, an example of how it’s used, and a comparison with Apache Cassandra.
CPP Part 2: Using Prepared Statements with the CPP (C++) Driver to connect to a Scylla cluster and perform queries.
Scala lesson part one and part two: How to use the Phantom Scala driver to create a sample Scala application that executes a few basic CQL operations with a Scylla cluster using the Phantom Scala driver.
Operations and Management
Security: This lesson covers security features and the way that Scylla handles security. By the end of this lesson, you’ll understand why security is essential in Scylla, the different security features, and how it works. Some of the topics covered in this lesson are:
- Why is it important to secure your data? Business value is increasingly tied to data. Security properties such as Identity, Authentication, Confidentiality, Availability, Integrity, and Non-repudiation
How to manage identities with users? Identity, Authentication, Users and passwords, and Availability
- What is Authentication, and how it limits access to the cluster to identified clients? Authentication is the process where login accounts and their passwords are verified, and the user is allowed access to the database.
- Users and passwords are created with roles using a GRANT statement. This procedure enables Authentication on the Scylla servers. However, once complete, all clients (application using Scylla/Apache Cassandra drivers) will stop working until they are updated to work with Authentication as well.
- The concepts of roles and permissions, Confidentiality, Non-repudiation
- What is Authorization? How are users granted permissions which entitle them to access or change data on specific keyspaces, tables, or an entire data center? Role-Based Access Control reduces lists of authorized users to a few roles assigned to multiple users. It also includes an example.
- Encryption In Transit, which is: Client to Node, Node to Node, and an overview of Encryption At Rest, which includes data stored in Tables, System, and Providers.
- Encryption at Rest, or how to encrypt user data as stored on disk? This is invisible to the client and available on Scylla Enterprise. It uses disk block encryption and has a minimal impact on performance.
- Auditing enables us to know who did/looked at / changed what and when by logging activities a user performs on the Scylla cluster.
- The importance of ensuring that Scylla runs in a trusted network environment, limiting access to IP / Port by role, using minimal privileges principle, avoiding Public IP if possible, and using VPC if possible. Security is an ongoing process. Ensure that you routinely upgrade to the latest Scylla and OS versions, routinely check for network exposure, routinely replace keys/passwords, use 2FA (Scylla Cloud), and use minimal privilege principle, apply available security features.
Using Spark with Scylla: by using Spark together with Scylla, users can deploy analytics workloads on the information stored in the transactional system. This lesson goes over an overview of Scylla, Spark, and how they can work together.
Configuration and Where to Run Scylla: covers Scylla configuration and setup and best practices. By the end of this lesson, you’ll better understand Scylla installation and configuration’s practical aspects.
How to Write Better Apps: In this lesson, you’ll learn how to write better applications. This is an intermediate to advanced level lesson. By the end of this lesson, you’ll have a better understanding of application development, caveats, performance, and what you should or shouldn’t do.
Scylla Drivers: Three new lessons were added to the course. They cover an overview of the Scylla Token Ring architecture, Scylla specific (shard-aware) drivers, why it is important to use them, and what is paging and shard awareness.
Alternator: The course is updated with new lessons and quizzes, covering Alternator, Scylla’s DynamoDB-compatible API, in action and how it works, implementation details, and a hands-on lab to help you get started.
Manager Repair Tombstones: This lesson deals with what repair is and why it is needed. What are Tombstones, why it is important, and Scylla Manager. Scylla Manager is a centralized cluster administration and recurrent tasks automation tool. Scylla Manager can schedule tasks such as repairs and backups. Scylla Repair is a process that runs in the background and synchronizes the data between nodes so that eventually, all the replicas hold the same data. Data stored on nodes can become inconsistent with other replicas over time, which is why repairs are a necessary part of database maintenance. Using Scylla repair makes data on the node consistent with the other nodes in the cluster. The best use of Scylla repair is to have the Scylla Manager schedule and run the repairs for you.
Admin Procedures and Tools: In this lesson, you’ll learn how to administer a Scylla cluster. It covers essential tools and procedures, best practices, common pitfalls, and tips for successfully running a cluster.
New Scylla Monitoring lesson: It’s extensive, and it includes nine topics, one lab, and two quizzes. This lesson covers Scylla Monitoring. Scylla Monitoring is a full-stack for monitoring a Scylla cluster and for alerting. The stack contains open source tools, including Prometheus and Grafana, and custom Scylla dashboards and tooling.
That’s a lot of new content for you to check out. Scylla University now covers most Scylla-related topics for Developers, DBAs, and Architects.
Visit Scylla University and start learning. It’s free, online, and self-paced!