INITIALIZATION Knowledgebase: ki-dev-large Base Query: why is the book titled a cool brisk walk through discrete math? Model: gpt-4o Use Curl?: None ================================================== **Elapsed Time: 0.00 seconds** ================================================== ROUTING Query type: summary ================================================== **Elapsed Time: 2.05 seconds** ================================================== RAG PARAMETERS Max Context To Include: 100 Lowest Score to Consider: 0.1 ================================================== **Elapsed Time: 0.14 seconds** ================================================== VECTOR SEARCH ALGORITHM TO USE Use MMR search?: True Use Similarity search?: False ================================================== **Elapsed Time: 0.11 seconds** ================================================== VECTOR SEARCH DONE ================================================== **Elapsed Time: 6.36 seconds** ================================================== PRIMER Primer: You are Simon, a highly intelligent personal assistant in a system called KIOS. You are a chatbot that can read knowledgebases through the "CONTEXT" that is included in the user's chat message. Your role is to act as an expert at summarization and analysis. In your responses to enterprise users, prioritize clarity, trustworthiness, and appropriate formality. Be honest by admitting when a topic falls outside your scope of knowledge, and suggest alternative avenues for obtaining information when necessary. Make effective use of chat history to avoid redundancy and enhance response relevance, continuously adapting to integrate all necessary details in your interactions. Use as much tokens as possible to provide a detailed response. ================================================== **Elapsed Time: 0.19 seconds** ================================================== FINAL QUERY Final Query: CONTEXT: ########## File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 1 Context: # A Cool Brisk Walk Through Discrete Mathematics **Version 2.2.1** **Stephen Davies, Ph.D.** ## Table of Contents 1. Introduction 2. Concepts - Sets - Functions - Relations 3. Applications 4. Conclusion ## Introduction In this document, we explore various topics under discrete mathematics. ## Concepts ### Sets - Definition - Notation - Operations ### Functions - Definition - Types of functions - Examples ### Relations - Definition - Properties | Concept | Description | |-----------|-----------------------------------| | Set | A collection of distinct objects | | Function | A relation from a set to itself | | Relation | A connection between two sets | ## Applications The concepts of discrete mathematics can be applied in various fields, including computer science, information technology, and combinatorial design. ## Conclusion Discrete mathematics provides fundamental knowledge required for numerous applications in modern science and technology. Image Analysis: ### Analysis of the Attached Visual Content #### 1. Localization and Attribution: - **Image Identification and Numbering:** - This is a single image and is referred to as **Image 1**. #### 2. Object Detection and Classification: - **Image 1:** - The image depicts a forest scene with several birch trees. - **Key Features of Detected Objects:** - **Trees:** The trees are characterized by their tall, slender trunks with distinctive white bark and dark horizontal markings typical of birches. - **Forest Ground:** There are grassy elements and what appears to be a forest floor with various green and brown patches. #### 3. Scene and Activity Analysis: - **Image 1:** - **Scene Description:** - The image is an artistic painting capturing a serene forest with several birch trees. - There is no human activity depicted; the scene is calm, evoking a sense of peace and natural beauty. #### 4. Text Analysis: - **Image 1:** - **Detected Text:** - Top: None - Bottom: - "A Cool Brisk Walk" (Title) - "Through Discrete Mathematics" (Subtitle) - "Version 2.2.1" (Version Information) - "Stephen Davies, Ph.D." (Author Name) - **Text Significance:** - The text suggests that this image is probably the cover of a book titled "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies, Ph.D. The version number indicates it is version 2.2.1, signifying a possibly updated or revised edition. #### 8. Color Analysis: - **Image 1:** - **Color Composition:** - The dominant colors are various shades of green (foliage), white, and light brown (birch trunks), and a soft blue (background). - **Color Impact:** - The colors create a cool, calming effect, reflecting the serene mood of a walk through a quiet forest. #### 9. Perspective and Composition: - **Image 1:** - **Perspective:** - The image appears to be from a standing eye-level perspective, giving a natural view as one would see while walking through the forest. - **Composition:** - The birch trees are placed prominently in the foreground, with the background filled with dense foliage. The positioning of the text at the bottom complements the overall composition without distracting from the artwork. #### 10. Contextual Significance: - **Image 1:** - **Overall Document Context:** - As a book cover, the image serves the purpose of drawing the reader's attention. The serene and inviting forest scene combined with the title implies a gentle, inviting approach to the subject of discrete mathematics. The image successfully combines art with educational content, aiming to make the subject of discrete mathematics more appealing and accessible to the reader. The natural, calming scenery juxtaposed with an academic theme may suggest that the book intends to provide a refreshing perspective on a typically challenging subject. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 2 Context: Copyright © 2023 Stephen Davies. ## University of Mary Washington **Department of Computer Science** James Farmer Hall 1301 College Avenue Fredericksburg, VA 22401 --- Permission is granted to copy, distribute, transmit and adapt this work under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/). The accompanying materials at [www.allthemath.org](http://www.allthemath.org) are also under this license. If you are interested in distributing a commercial version of this work, please contact the author at [stephen@umw.edu](mailto:stephen@umw.edu). The LaTeX source for this book is available from: [https://github.com/divilian/cool-brisk-walk](https://github.com/divilian/cool-brisk-walk). --- Cover art copyright © 2014 Elizabeth M. Davies. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 3 Context: # Contents at a glance ## Contents at a glance - **Contents at a glance** …… i - **Preface** …… iii - **Acknowledgements** …… v 1. **Meetup at the trailhead** …… 1 2. **Sets** …… 7 3. **Relations** …… 35 4. **Probability** …… 59 5. **Structures** …… 85 6. **Counting** …… 141 7. **Numbers** …… 165 8. **Logic** …… 197 9. **Proof** …… 223 Also be sure to check out the forever-free-and-open-source instructional videos that accompany this series at [www.allthemath.org](http://www.allthemath.org)! #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 5 Context: # Preface Discrete math is a popular book topic — start Googling around and you’ll find a zillion different textbooks about it. Take a closer look, and you’ll discover that most of these are pretty thick, dense volumes packed with lots of equations and proofs. They’re principled approaches, written by mathematicians and (seemingly) to mathematicians. I speak with complete frankness when I say I’m comforted to know that the human race is well covered in this area. We need smart people who can derive complex expressions and prove theorems from scratch, and I’m glad we have them. Your average computer science practitioner, however, might be better served by a different approach. There are elements to the discrete math mindset that a budding software developer needs experience with. This is why discrete math is (properly, I believe) part of the mandatory curriculum for most computer science undergraduate programs. But for future programmers and engineers, the emphasis should be different than it is for mathematicians and researchers in computing theory. A practical computer scientist mostly needs to be able to use these tools, not to derive them. She needs familiarity, and practice, with the fundamental concepts and the thought processes they involve. The number of times the average software developer will need to construct a proof in graph theory is probably near zero. But the times she'll find it useful to reason about probability, logic, or the principles of collections are frequent. I believe the majority of computer science students benefit most from simply gaining an appreciation for the richness and rigor of #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 6 Context: # PREFACE This material, what it means, and how it impacts their discipline. Becoming an expert theorem prover is not required, nor is deriving closed-form expressions for the sizes of trees with esoteric properties. Basic fluency with each topic area, and an intuition about when it can be applied, is the proper aim for most of those who would go forward and build tomorrow's technology. To this end, the book in your hands is a quick guided tour of introductory-level discrete mathematics. It's like a cool, brisk walk through a pretty forest. I point out the notable features of the landscape and try to instill a sense of appreciation and even of awe. I want the reader to get a feel for the lay of the land and a little exercise. If the student acquires the requisite vocabulary, gets some practice playing with the toys, and learns to start thinking in terms of the concepts here described, I will count it as a success. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 7 Context: # Acknowledgements A hearty thanks to Karen Anewalt, Crystal Burson, Prafulla Giri, Tayyar Hussain, Jennifer Magee, Veena Ravishankar, Jacob Shtabnoy, and a decade's worth of awesome UMW Computer Science students for their many suggestions and corrections to make this text better! #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 9 Context: # Chapter 1 ## Meetup at the trailhead Before we set out on our “cool, brisk walk,” let’s get oriented. What is discrete mathematics, anyway? Why is it called that? What does it encompass? And what is it good for? Let’s take the two words of the subject, in reverse order. First, **math**. When most people hear “math,” they think “numbers.” After all, isn’t math the study of quantity? And isn’t that the class where we first learned to count, add, and multiply? Mathematics certainly has its root in the study of numbers — specifically, the “natural numbers” (the integers from 1 on up) that fascinated the ancient Greeks. Yet math is broader than this, almost to the point where numbers can be considered a special case of something deeper. In this book, when we talk about trees, sets, or formal logic, there might not be a number in sight. Math is about **abstract, conceptual objects that have properties,** and the implications of those properties. An “object” can be any kind of “thought material” that we can define and reason about precisely. Much of math deals with questions like, “Suppose we defined a certain kind of thing that had certain attributes. What would be the implications of this, if we reasoned it all the way out?” The “thing” may or may not be numerical, whatever it turns out to be. Like a number, however, it will be crisply defined, have certain known aspects to it, and be capable of combining with other things in some way. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 10 Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD Fundamental to math is that it deals with the **abstract**. Abstract, which is the opposite of concrete, essentially means something that can’t be perceived with the senses. A computer chip is concrete: you can touch it, you can see it. A number is not; nor is a function, a binary tree, or a logical implication. The only way to perceive these things is with the power of the mind. We will write expressions and draw pictures of many of our mathematical structures in order to help visualize them, and nearly everything we study will have practical applications whereby the abstractness gets grounded in concreteness for some useful purpose. But the underlying mathematical entity remains abstract and ethereal — only accessible to the mind’s eye. We may use a pencil to form the figure “5” on a piece of paper, but that is only a concrete manifestation of the underlying concept of “five-ness.” Don’t mistake the picture or the symbol for the thing itself, which always transcends any mere physical representation. The other word in the name of our subject is **“discrete”** (not to be confused with “discreet,” which means something else entirely). The best way to appreciate what discrete means is to contrast it with its opposite, **continuous**. Consider the following list: | Discrete | Continuous | |-------------------------------|------------------------| | whole numbers (\( \mathbb{Z} \)) | real numbers (\( \mathbb{R} \)) | | int | double | | digital | analog | | quantum | continuum | | counting | measuring | | number theory | analysis | | \( \Sigma \) | \( \int \frac{d}{dx} \) | What do the left-hand entries have in common? They describe things that are measured in crisp, distinct intervals, rather than varying smoothly over a range. Discrete things jump suddenly from position to position, with rigid precision. If you’re 5 feet tall, you might some day grow to 5.3 feet; but though there might be 5 #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 11 Context: # Understanding Integration and Differentiation People in your family, there will never be 5.3 of them (although there could be 6 someday). The last couple of entries on this list are worth a brief comment. They are math symbols, some of which you may be familiar with. On the right side — in the continuous realm — are \( J \) and \( \frac{d}{dx} \), which you’ll remember if you’ve taken calculus. They stand for the two fundamental operations of integration and differentiation. Integration, which can be thought of as finding “the area under a curve,” is basically a way of adding up a whole infinite bunch of numbers over some range. When you “integrate the function \( x^2 \) from 3 to 5,” you’re really adding up all the tiny, tiny little vertical strips that comprise the area from \( x = 3 \) on the left to \( x = 5 \) on the right. Its corresponding entry in the left-column of the table is \( \Sigma \), which is just a short-hand for “sum up a bunch of things.” Integration and summation are equivalent operations; it’s just that when you integrate, you’re adding up all the (infinitely many) slivers across the real-line continuum. When you sum, you’re adding up a fixed sequence of entries, one at a time, like in a loop. \( \Sigma \) is just the discrete “version” of \( J \). The same sort of relationship holds between ordinary subtraction (\(-\)) and differentiation (\(\frac{d}{dx}\)). If you’ve plotted a bunch of discrete points on \( x \)-\( y \) axes, and you want to find the slope between two of them, you just subtract their \( y \) values and divide by the \( x \) distance between them. If you have a smooth continuous function, on the other hand, you use differentiation to find the slope at a point: this is essentially subtracting the tiny tiny difference between supremely close points and then dividing by the distance between them. Don’t worry; you don’t need to have fully understood any of the integration or differentiation stuff I just talked about, or even to have taken calculus yet. I’m just trying to give you some feel for what “discrete” means, and how the dichotomy between discrete and continuous really runs through all of math and computer science. In this book, we will mostly be focusing on discrete values and structures, which turn out to be of more use in computer science. That’s partially because, as you probably know, computers #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 12 Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD Themselves are discrete, and can only store and compute discrete values. There can be many of them — megabytes, gigabytes, terabytes — but each value stored is fundamentally comprised of bits, each of which has a value of either 0 or 1. This is unlike the human brain, by the way, whose neuronal synapses communicate based on the continuous quantities of chemicals present in their axons. So I guess “computer” and “brain” are another pair of entities we could add to our discrete vs. continuous list. There’s another reason, though, why discrete math is of more use to computer scientists than continuous math is, beyond just the bits-and-bytes thing. Simply put, computers operate algorithmically. They carry out programs in step-by-step, iterative fashion. First do this, then do that, then move on to something else. This mechanical execution, like the ticking of a clock, permeates everything the computer can do, and everything we can tell it to do. At a given moment in time, the computer has completed step 7, but not step 8; it has accumulated 38 values, but not yet 39; its database has exactly 15 entries in it, no more and no less; it knows that after accepting this friend request, there will be exactly 553 people in your set of friends. The whole paradigm behind reasoning about computers and their programs is discrete, and that’s why we computer scientists find different problems worth thinking about than most of the world did a hundred years ago. But it’s still math. It’s just discrete math. There’s a lot to come, so limber up and let me know when you’re ready to hit the road. ## 1.1 Exercises Use an index card or a piece of paper folded lengthwise, and cover up the right-hand column of the exercises below. Read each exercise in the left-hand column, answer it in your mind, then slide the index card down to reveal the answer and see if you’re right! For every exercise you missed, figure out why you missed it before moving on. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 13 Context: # 1.1 EXERCISES 1. **What’s the opposite of concrete?** Abstract. 2. **What’s the opposite of discrete?** Continuous. 3. **Consider a quantity of water in a glass. Would you call it abstract, or concrete? Discrete, or continuous?** Concrete, since it’s a real entity you can experience with the senses. Continuous, since it could be any number of ounces (or liters, or tablespoons, or whatever). The amount of water certainly doesn’t have to be an integer. (Food for thought: since all matter is ultimately comprised of atoms, are even substances like water discrete?) 4. **Consider the number 27. Would you call it abstract, or concrete? Discrete, or continuous?** Abstract, since you can’t see or touch or smell “twenty-seven.” Probably discrete, since it’s an integer, and when we think of whole numbers we think “discrete.” (Food for thought: in real life, how would you know whether I meant the integer “27” or the decimal number “27.00”? And does it matter?) Clearly it’s discrete. Abstract vs. concrete, though, is a little tricky. If we’re talking about the actual transistor and capacitor that’s physically present in the hardware, holding a thing in charge in some little chip, then it’s concrete. But if we’re talking about the value “1” that is conceptually part of the computer’s currently executing state, then it’s really abstract just like 27 was. In this book, we’ll always be talking about bits in this second, abstract sense. 5. **Consider a bit in a computer’s memory. Would you call it abstract, or concrete? Discrete, or continuous?** Any kind of abstract object that has properties we can reason about. 6. **If math isn’t just about numbers, what else is it about?** Any kind of abstract object that has properties we can reason about. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 15 Context: # Chapter 2 ## Sets The place from which we'll start our walk is a body of mathematics called "set theory." Set theory has an amazing property: it's so simple and applicable that almost all the rest of mathematics can be based on it! This is all the more remarkable because set theory itself came along pretty late in the game (as things go) — it was singlehandedly invented by one brilliant man, Georg Cantor, in the 1870s. That may seem like a long time ago, but consider that by the time Cantor was born, mankind had already accumulated an immense wealth of mathematical knowledge: everything from geometry to algebra to calculus to prime numbers. Set theory was so elegant and universal, though, that after it was invented, nearly everything in math was redefined from the ground up to be couched in the language of sets. It turns out that this simple tool is an amazingly powerful way to reason about mathematical concepts of all flavors. Everything else in this book stands on set theory as a foundation. Cantor, by the way, went insane as he tried to extend set theory to fully encompass the concept of infinity. Don't let that happen to you! #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 18 Context: # CHAPTER 2. SETS Set theory? Fuzzy sets change this membership assumption: items can indeed be “partially in” a set. One could declare, for instance, that Dad is “100% female,” which means he’s only 100% in the \( F \) set. That might or might not make sense for gender, but you can imagine that if we defined a set \( T \) of “the tall people,” such a notion might be useful. At any rate, this example illustrates a larger principle which is important to understand: in math, things are the way they are simply because we’ve decided it’s useful to think of them that way. If we decide there’s a different useful way to think about them, we can define new assumptions and proceed according to new rules. It doesn’t make any sense to say “sets are (or aren’t) really fuzzy”: because there is no “really.” All mathematics proceeds from whatever mathematicians have decided is useful to define, and any of it can be changed at any time if we see fit. ## 2.2 Defining sets There are two ways to define a set: extensionally and intensionally. I’m not saying there are two kinds of sets; rather, there are simply two ways to specify a set. To define a set extensionally is to list its actual members. That’s what we did when we said \( P = \{ Dad, Mom \} \) above. In this case, we’re not giving any “meaning” to the set; we’re just mechanically spelling out what’s in it. The elements Dad and Mom are called the **extension** of the set \( P \). The other way to specify a set is intensionally, which means to describe its meaning. Another way to think of this is specifying a rule by which it can be determined whether or not a given element is in the set. If I say “Let \( P \) be the set of all parents,” I am defining \( P \) intensionally. I haven’t explicitly said which specific elements of the set are in \( P \). I’ve just given the meaning of the set, from which you can figure out the extension. We call “parent-ness” the **intension** of \( P \). #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 21 Context: 2.4 Sets are not arrays ======================= If you’ve done some computer programming, you might see a resemblance between sets and the collections of items often used in a program: arrays, perhaps, or linked lists. To be sure, there are some similarities. But there are also some very important differences, which must not be overlooked: - **No order.** As previously mentioned, there is no order to the members of a set. “{Mom, Dad}” is the same set as “{Dad, Mom}”. In a computer program, of course, most arrays or lists have first, second, and last elements, and an index number assigned to each. - **No duplicates.** Suppose \( M \) is the set of all males. What would it possibly mean to say \( M = \{T.J., T.J., Johnny\} \)? Would that mean that “T.J. is twice the man that Johnny is”? This is obviously nonsensical. The set \( M \) is based on a property: maleness. Each element of \( M \) is either male, or it isn't. It can't be “male three times.” Again, in an array or linked list, you could certainly have more than one copy of the same item in different positions. - **Infinite sets.** 'Nuff said. I’ve never seen an array with infinitely many elements, and neither will you. - **Untyped.** Most of the time, an array or other collection in a computer program contains elements of only a single type; it’s an array of integers, or a linked list of Customer objects, for example. This is important because the program often needs to treat all elements in the collection the same way. Perhaps it needs to loop over the array to add up all the numbers, or iterate through a linked list and search for customers who have not placed an order in the last six months. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 23 Context: # 2.6 Sets of Sets Sets are heterogeneous — a single set can contain four universities, seven integers, and an image — and so it might occur to you that they can contain other sets as well. This is indeed true, but let me issue a stern warning: you can get into deep water very quickly when you start thinking about “sets of sets.” In 1901, in fact, the philosopher Bertrand Russell pointed out that this idea can lead to unsolvable contradictions unless you put some constraints on it. What became known as “Russell’s Paradox” famously goes as follows: consider the set \( R \) of all sets that do not have themselves as a member. For this reason, we’ll be very careful to use curly braces to denote sets, and parentheses to denote ordered pairs. By the way, although the word “coordinate” is often used to describe the elements of an ordered pair, that’s really a geometry-centric word that implies a visual plot of some kind. Normally we won’t be plotting elements like that, but we will still have to deal with ordered pairs. I’ll just call the constituent parts “elements” to make it more general. Three-dimensional points need ordered triples \( (x, y, z) \), and it doesn’t take a rocket scientist to deduce that we could extend this to any number of elements. The question is what to call them, and you do sort of sound like a rocket scientist (or other generic nerd) when you say tuple. Some people rhyme this word with “Drupal,” and others with “couple,” by the way, and there seems to be no consensus. If you have an ordered pair-type thing with 5 elements, therefore, it’s a 5-tuple (or a quintuple). If it has 117 elements, it’s a 117-tuple, and there’s really nothing else to call it. The general term (if we don’t know or want to specify how many elements) is n-tuple. In any case, it’s an ordered sequence of elements that may contain duplicates, so it’s very different than a set. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 24 Context: # CHAPTER 2. SETS as members³. Now is \( R \) a member of itself, or isn’t it? Either way you answer turns out to be wrong (try it!) which means that this whole setup must be flawed at some level. The good news is that as long as you don’t deal with this kind of self-referential loop (“containing yourself as a member”) then it’s pretty safe to try at home. Consider this set: \[ V = \{ 3, 5, \{ 5, 4 \}, 2 \} \] This set has four (not five) members. Three of \( V \)’s members are integers: 2, 3, and 5. The other one is a set (with no name given). That other set, by the way, has two members of its own: 4 and 5. If you were asked, “is \( 4 \in V \)?” the answer would be no. As a corollary to this, there’s a difference between \[ \emptyset \] and \[ \{ \emptyset \} \] The former is a set with no elements. The latter is a set with one element: and that element just happens to be a set with nothing in it. ## 2.7 Cardinality When we talk about the number of elements in a set, we use the word cardinality. You’d think we could just call it the “size” of the set, but mathematicians sometimes like words that sound cool. The cardinality of \( M \) (the set of males, where the Davies family is the domain of discourse) is 3, because there are three elements in it. The cardinality of the empty set \( \emptyset \) is 0. The cardinality of the set of all integers is \( \aleph_0 \). Simple as that. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 25 Context: 2.8 Some special sets ===================== In addition to the empty set, there are symbols for some other common sets, including: - **Z** — the integers (positive, negative, and zero) - **N** — the natural numbers (positive integers and zero) - **Q** — the rational numbers (all numbers that can be expressed as an integer divided by another integer) - **R** — the real numbers (all numbers that aren't imaginary, even decimal numbers that aren't rational) The cardinality of all these sets is infinity, although, as I alluded to previously, \(\mathbb{R}\) is in some sense "greater than" \(\mathbb{N}\). For the curious, we say that \(\mathbb{N}\) is a countably infinite set, whereas \(\mathbb{R}\) is uncountably infinite. Speaking very loosely, this can be thought of this way: if we start counting up all the natural numbers, 0, 1, 2, 3, 4, ..., we will never get to the end of them. But at least we can start counting. With the real numbers, we can't even get off the ground. Where do you begin? Starting with 0 is fine, but then what's the "next" real number? Choosing anything for your second number inevitably skips a lot in between. Once you've digested this, I'll spring another shocking truth on you: \(|\mathbb{Q}| \text{ is actually equal to } |\mathbb{R}|, \text{ not greater than it as } |\mathbb{R}|. \text{ Cantor came up with an ingenious numbering scheme whereby all the rational numbers — including 3, -9, \text{ and } \frac{1}{17} — can be listed off regularly, in order, just like the integers can. And so } |\mathbb{Q}| = |\mathbb{N}| \neq |\mathbb{R}|. \text{ This kind of stuff can blow your mind.} #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 30 Context: # CHAPTER 2. SETS Let \( X \) be the set of all known murderers. Let \( Y \) be the set of all known thieves. Again, it's best understood with a specific example. Let’s say you’re renting a house and want to make sure you don’t have any shady characters under the roof. Now, as a landlord, you don’t want any thieves or murderers renting your property. So who are you willing to rent to? Answer: if \( \Omega \) is the set of all people, you are willing to rent to \( X \cup Y \). Why that? Because if you take \( X \cup Y \), that gives you all the undesirables: people who are either murderers or thieves (or both). You don’t want to rent to any of them. In fact, you want to rent to the complement of that set; namely, “anybody else.” Putting an overbar on that expression gives you all the non-thieves and non-murderers. Very well. But now look at the right-hand side of the equation. \( X \) gives you the non-thieves. \( Y \) gives you the non-murderers. Now in order to get acceptable people, you want to rent only to someone who’s in both groups. Put another way, they have to be both a non-thief and a non-murderer in order for you to rent to them. Therefore, they must be in the intersection of the non-thief group and the non-murderer group. Therefore, the two sides of this equation are the same. The other form of De Morgan’s law is stated by flipping the intersections and unions: \[ X \cap Y = \overline{X \cup Y} \] Work this one out for yourself using a similar example, and convince yourself it’s always true. Augustus De Morgan, by the way, was a brilliant 19th century mathematician with a wide range of interests. His name #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 33 Context: ```markdown ## 2.11. POWER SETS Don't lose sight. There are four elements to the power set of \( A \), each of which is one of the possible subsets. It might seem strange to talk about "all of the possible subsets"—when I first learned this stuff, I remember thinking at first that there would be no limit to the number of subsets you could make from a set. But of course there is. To create a subset, you can either include, or exclude, each one of the original set's members. In \( A \)’s case, you can either (1) include both Dad and Lizzy, or (2) include Dad but not Lizzy, or (3) include Lizzy but not Dad, or (4) exclude both, in which case your subset is empty. Therefore, \( P(A) \) includes all four of those subsets. Now what's the cardinality of \( P(X) \) for some set \( X \)? That's an interesting question, and one well worth pondering. The answer ripples through the heart of a lot of combinatorics and the binary number system, topics we'll cover later. And the answer is right at our fingertips if we just extrapolate from the previous example. To form a subset of \( X \), we have a choice to either include, or else exclude, each of its elements. So there's two choices for the first element, and then whether we choose to include or exclude that first element, there are two choices for the second, etc. So if \( |X| = 2 \) (recall that this notation means \( X \) has two elements) or \( X \) has a cardinality of \( 2^n \), then its power set has \( 2 \times 2 \) members. If \( |X| = 3 \), then its power set has \( 2 \times 2 \times 2 \) members. In general: \[ |P(X)| = 2^{|X|}. \] As a limiting case (and a brain-bender), notice that if \( X \) is the empty set, then \( P(X) \) has one (not zero) members, because there is in fact one subset of the empty set: namely, the empty set itself. So \( |X| = 0 \), and \( |P(X)| = 1 \). And that jives with the above formula. ``` #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 36 Context: # CHAPTER 2: SETS might be double-majoring in both. Hence, this group of subsets is neither mutually exclusive nor collectively exhaustive. It’s interesting to think about gender and partitions: when I grew up, I was taught that males and females were a partition of the human race. But now I’ve come to realize that there are non-binary persons who do not identify with either of those genders, and so it’s not a partition after all. ## Question: Is the number of students \( |S| \) equal to the number of off-campus students plus the number of on-campus students? Obviously yes. But why? The answer: because the off-campus and on-campus students form a partition. If we added up the number of freshmen, sophomores, juniors, and seniors, we would also get \( |S| \). But adding up the number of computer science majors and English majors would almost certainly not be equal to \( |S| \), because some students would be double-counted and others counted not at all. This is an example of the kind of beautiful simplicity that partitions provide. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 39 Context: # 2.13. EXERCISES 21. What’s \( A \cap B \)? \{ Macbeth, Hamlet \}. 22. What’s \( B \cap T \)? ∅. 23. What’s \( B \cap \overline{T} \)? B. (which is { Scrabble, Monopoly, Othello. }) 24. What’s \( A \cup (B \cap T) \)? \{ Hamlet, Othello, Macbeth \}. 25. What’s \( (A \cup B) \cap T \)? \{ Macbeth, Hamlet \}. 26. What’s \( A \setminus B \)? Simply \( T \), since these two sets have nothing in common. 27. What’s \( T' \setminus B \)? \{ (Hamlet, Macbeth), (Hamlet, Hamlet), (Hamlet, Othello), (Village, Macbeth), (Village, Hamlet), (Village, Othello), (Town, Hamlet), (Town, Othello) \}. The order of the ordered pairs within the set is not important; the order of the elements within each ordered pair is important. 28. What’s \( T \cap X \)? 0. 29. What’s \( (B \cap B) \times (A \cap T) \)? \{ (Scrabble, Hamlet), (Monopoly, Hamlet), (Othello, Hamlet) \}. 30. What’s \( I \cup B \cup T \)? 7. 31. What’s \( I \cap (A \cap T) \)? 21. (The first parenthesized expression gives rise to a set with 7 elements, and the second to a set with three elements (B itself). Each element from the first set gets paired with an element from the second, so there are 21 such pairings.) #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 40 Context: # CHAPTER 2. SETS ### 33. Is A an extensional set, or an intensional set? The question doesn't make sense. Sets aren't "extensional" or "intensional"; rather, a given set can be described extensionally or intensionally. The description given in item 19 is an extensional one; an intensional description of the same set would be "The Shakespeare tragedies Stephen studied in high school." ### 34. Recall that G was defined as { Matthew, Mark, Luke, John }. Is this a partition of G? - { Luke, Matthew } - { John } No, because the sets are not collectively exhaustive (Mark is missing). ### 35. Is this a partition of G? - { Mark, Luke } - { Matthew, Luke } No, because the sets are neither collectively exhaustive (John is missing) nor mutually exclusive (Luke appears in two of them). ### 36. Is this a partition of G? - { Matthew, Mark, Luke } - { John } Yes. (Trivia: this partitions the elements into the synoptic gospels and the non-synoptic gospels). ### 37. Is this a partition of G? - { Matthew, Luke } - { John, Mark } Yes. (This partitions the elements into the gospels which feature a Christmas story and those that don't). #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 43 Context: # Chapter 3 ## Relations Sets are fundamental to discrete math, both for what they represent in themselves and for how they can be combined to produce other sets. In this chapter, we’re going to learn a new way of combining sets, called relations. ### 3.1 The Idea of a Relation A relation between a set \( X \) and \( Y \) is a subset of the Cartesian product. That one sentence packs in a whole heck of a lot, so spend a moment thinking deeply about it. Recall that \( X \times Y \) yields a set of ordered pairs, one for each combination of an element from \( X \) and an element from \( Y \). If \( X \) has 5 elements and \( Y \) has 4, then \( X \times Y \) is a set of 20 ordered pairs. To make it concrete, if \( X \) is the set \{ Harry, Ron, Hermione \} and \( Y \) is the set \{ Dr. Pepper, Mt. Dew \}, then \( X \times Y \) is: - \( (Harry, Dr. Pepper) \) - \( (Harry, Mt. Dew) \) - \( (Ron, Dr. Pepper) \) - \( (Ron, Mt. Dew) \) - \( (Hermione, Dr. Pepper) \) - \( (Hermione, Mt. Dew) \) Convince yourself that every possible combination is in there. I listed them out methodically to make sure I didn’t miss any (all the Harry’s first, with each drink in order, then all the Ron’s, etc.), but of course there’s no order to the members of a set, so I could have listed them in any order. Now if I define a relation between \( X \) and \( Y \), I’m simply specifying that certain of these ordered pairs are in the relation, and certain #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 52 Context: # CHAPTER 3. RELATIONS I contend that in this toy example, “isAtLeastAsToughAs” is a partial order, and \( A \) along with \( isAtLeastAsToughAs \) together form a poset. I reason as follows. It’s reflexive, since we started off by adding every dog with itself. It’s antisymmetric, since when we need add both \( (x, y) \) and \( (y, x) \) to the relation. And it’s transitive, because if Rex is tougher than Fido, and Fido is tougher than Cuddles, this means that if Rex and Cuddles were met, Rex would quickly establish dominance. (I’m no zoologist, and am not sure if the last condition truly applies with real dogs. But let’s pretend it does.) It’s called a “partial order” because it establishes a partial, but incomplete, hierarchy among dogs. If we ask, “is dog X tougher than dog Y?” the answer is never ambiguous. We’re never going to say, “well, dog X was superior to dog A, who was superior to dog Y . . .” but then again, dog Y was superior to dog B, who was superior to dog X, so there’s no telling which of X and Y is truly toughest.” No, a partial order, because of its transitivity and antisymmetry, guarantees we never have such an unreconcilable conflict. However, we could have a lack of information. Suppose Rex has never met Killer, and nobody Rex has met has ever met anyone Killer has met. There’s no chain between them. They’re in two separate universes as far as we’re concerned, and we’d have no way of knowing which was toughest. It doesn’t have to be that extreme, though. Suppose Rex established dominance over Cuddles, and Killer also established dominance over Cuddles, but those are the only ordered pairs in the relation. Again, there’s no way to tell whether Rex or Killer is the tougher dog. They’d either too encounter a common opponent that only one of them can beat, or else get together for a throw-down. So a partial order gives us some semblance of structure — the relation establishes a directionality, and we’re guaranteed not to get wrapped up in contradictions — but it doesn’t completely order all the elements. If it does, it’s called a total order. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 55 Context: # 3.6 FUNCTIONS Functions aren’t necessarily numeric. Sometimes we do draw “plots” of sorts, though, like this one: ``` Harry Mt. Dew ↘ Ron ──────── Dr. Pepper ↖ Hermione ``` *Figure 3.1: A function represented graphically.* This simply shows which elements of the domain map to which elements of the codomain. The left blob is the domain, the right blob is the codomain, and there’s an arrow representing each mapping. Now as with relations, functions normally have “meaning.” We could define a function called `firstTasted` that associates each wizard with the soft drink he or she first sampled as a child. We could define another called `faveDrink` that maps each wizard to his or her favorite — presuming that every wizard has a favorite drink in the set (Hermione will have to overlook her iced tea and choose among the options provided). A third function called `wouldChooseWithMexicanFood` provides information about which drink each wizard provides with that type of cuisine. Here are Ron’s values for each of the three functions: - `firstTasted(Ron) = Mt. Dew` - `faveDrink(Ron) = Mt. Dew` - `wouldChooseWithMexicanFood(Ron) = Dr. Pepper` These values indicate that Mt. Dew was the soda pop that Ron first sipped, and it has been his favorite ever since, although at La Estrellita he prefers a Pepper. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 60 Context: # CHAPTER 3. RELATIONS own wizard, and no soft drinks (or wizards) are left out. How exciting! This is a perfectly bijective function, also called a **bijection**. Again, the only way to get a bijection is for the domain and codomain to be the same size (although that alone does not guarantee a bijection; witness \( f \), above). Also observe that if they are the same size, then injectivity and surjectivity go hand-in-hand. Violate one, and you’re bound to violate the other. Uphold the one, and you’re bound to uphold the other. There’s a nice, pleasing, symmetrical elegance to the whole idea. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 64 Context: # CHAPTER 3. RELATIONS 22. **Is sameShirtColor transitive?** Yes. If Kirk and Sulu wear the same color (yellow), and Sulu and Chekov wear the same color (yellow), then Kirk and Chekov most certainly will wear the same color (yellow). 23. **Above, we defined A as the set {Chuck, Julie, Sam} and S as the set {basketball, volleyball}. Then we defined the relation { (Julie, basketball), (Sam, basketball), (Julie, volleyball) }. Is this relation a function?** No, because it's missing Chuck entirely. 24. **Suppose we added the ordered pair (Chuck, basketball) to it. Now is it a function?** No, because Julie appears twice, mapping to two different values. 25. **Okay. Suppose we then remove (Julie, volleyball). We now have { (Julie, basketball), (Sam, basketball), (Chuck, basketball) }. Is this a function?** Yes. Congratulations. 26. **Let's call this function "faveSport", which suggests that its meaning is to indicate which sport is each athlete's favorite. What's the domain of faveSport?** {Julie, Chuck, Sam}. 27. **What's the codomain of faveSport?** {basketball, volleyball}. 28. **What's the range of faveSport?** {basketball}. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 67 Context: # Chapter 4 ## Probability Probability is the study of **uncertainty**. This may seem like a hopeless endeavor, sort of like knowing the unknowable, but it's not. The study of probability gives us tools for taming the uncertain world we live and program in, and for reasoning about it in a precise and helpful way. We may not know exactly how long a particular visitor is willing to wait for our webpage to load in their browser, but we can use probability to estimate how much traffic we’ll lose if this takes longer than a certain average duration. We may not know which specific passwords a hacker will try as he attempts to break our security protocol, but we can use probability to estimate how feasible this approach will be for him. We may not know exactly when a certain program will run out of RAM and have to swap its data out to virtual memory, but we can predict how often this is likely to occur — and how painful it will be for us — given a certain system load and user behavior. The trick is to use the tools we've already built — sets, relations, functions — to characterize and structure our notions of the relative likelihood of various outcomes. Once those underpinnings are secured, a layer of deductive reasoning will help us make good use of that information to begin to predict the future. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 68 Context: # 4.1 Outcomes and events Since life is uncertain, we don’t know for sure what is going to happen. But let’s start by assuming we know what things might happen. Something that might happen is called an outcome. You can think of this as the result of an experiment if you want to, although normally we won’t be talking about outcomes that we have explicitly manipulated and measured via scientific means. It’s more like we’re just curious how some particular happening is going to turn out, and we’ve identified the different ways it can turn out and called them outcomes. Now we’ve been using the symbol \( \Omega \) to refer to “the domain of discourse” or “the universal set” or “all the stuff we’re talking about.” We’re going to give it yet another name now: the sample space. \( \Omega \), the sample space, is simply the *set of all possible outcomes*. Any particular outcome — call it \( O \) — is an element of this set, just like in Chapter 1 every conceivable element was a member of the domain of discourse. If a woman is about to have a baby, we might define \( \Omega \) as \{ boy, girl \}. Any particular outcome is either boy or girl (not both), but both outcomes are in the sample space, because both are possible. If we roll a die, we’d define \( \Omega \) as \{ 1, 2, 3, 4, 5, 6 \}. If we’re interested in motor vehicle safety, we might define \( \Omega \) for a particular road trip as \{ safe, accident \}. The outcomes don’t have to be equally likely; an important point we’ll return to soon. In probability, we define an event as a *subset of the sample space*. In other words, an event is a group of related outcomes (though an event might contain just one outcome, or even zero). I always thought this was a funny definition for the word “event”: it’s not the first thing that word brings to mind. But it turns out to be a useful concept, because sometimes we’re not interested in any particular outcome necessarily, but rather in whether the outcome — whatever it is — has a certain property. For instance, suppose at the start of some game, my opponent and I each roll the die, agreeing that the highest roller gets to go first. Suppose he rolls a 2. Now it’s my turn. The \( \Omega \) for my die roll is of course \{ 1, 2, 3 \}. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 73 Context: # 4.2. PROBABILITY MEASURES If two events are mutually exclusive, then there is no intersection to subtract out. That’s a special case of rule 3. For example, suppose I defined the event \( C \) as a country winner winning the competition. In this case, \( C \) contains only one outcome: Carrie. Therefore \( U \) and \( C \) are mutually exclusive. So if I asked "what's the probability of an underage or country winner?" we’d compute \( Pr(U \cup C) \) as: \[ Pr(U \cup C) = Pr(U) + Pr(C) - Pr(U \cap C) \] \[ = 0.4 + 0.1 - 0 = 0.5 \] We didn’t double-count anything, so there was no correction to make. Here are a few more pretty obvious rules for probability measures, which follow logically from the first 3: 1. \( Pr(\emptyset) = 0 \) 2. \( Pr(A) = 1 - Pr(A^c) \) (recall the “total complement” operator from p. 18.) 3. \( Pr(A) \leq Pr(B) \) if \( A \subseteq B \) Finally, let me draw attention to a common special case of the above rules, which is the situation in which all outcomes are equally likely. This usually happens when we roll dice, flip coins, deal cards, etc. Since the probability of rolling a 3 is (normally) the same as rolling a 6, and the probability of being dealt the 10♠ is the same as the Q♣, it may also happen when we generate encryption keys, choose between alternate network routing paths, or determine the initial positions of baddies in a first-person shooter level. In this case, if there are \( N \) possible outcomes (note \( N = |S| \)), then the probability of any event \( A \) is: \[ Pr(A) = \frac{|A|}{N} \] #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 75 Context: # 4.3. PHILOSOPHICAL INTERLUDE The idea of how often heads will occur in the long run. But if we flip it a million times and get 500,372 heads, we can confidently say that the probability of getting a head on a single flip is approximately .500. This much isn’t controversial: it’s more like common sense. But the frequentist philosophy states that this is really the only way that probability can be defined. It’s what probability is: the frequency with which we can expect certain outcomes to occur, based on our observations of their past behavior. Probabilities only make sense for things that are repeatable, and reflect a known, reliable trend in how often they produce certain results. Historical proponents of this philosophy include John Venn, the inventor of the aforementioned Venn diagram, and Ronald Fisher, one of the greatest biologists and statisticians of all time. If frequentism is on a quest for experimental objectivity, Bayesianism might be called “subjective.” This isn’t to say it’s arbitrary or sloppy. It simply has a different notion of what probability ultimately means. Bayesians interpret probability as a quantitative personal assessment of the likelihood of something happening. They point out that for many (most) events of interest, trials are neither possible nor sensible. Suppose I’m considering asking a girl out to the prom, and I’m trying to estimate how likely it is she’ll go with me. It’s not like I’m going to ask her a hundred times and count how many times she says yes, then divide by 100 to get a probability. There is in fact no way to perform a trial or use past data to guide me, and at any rate she’s only going to say yes or no once. So based on my background knowledge and my assumptions about her, myself, and the world, I form an opinion which could be quantified as a “percent chance.” Once I’ve formed this opinion (which of course involves guesswork and subjectivity) I can then reason about it mathematically, using all the tools we’ve been developing. Of special interest to Bayesians is the notion of updating probabilities when new information comes to light, a topic we’ll return to in a moment. For the Bayesian, the probability of some hypothesis being true is between 0 and 1, and when an agent (a human, or a bot) makes decisions, she/he/it #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 88 Context: # CHAPTER 4. PROBABILITY Pr(L|F). Considering only the women, then, you compute Pr(L|F) = \(\frac{22}{198} \approx .111\) from the data in the table. Wait a minute. That’s exactly what we had before. Learning that we had chosen a woman told us nothing useful about her handedness. That’s what we mean by saying that the L and F events are independent of each other. The shrewd reader may object that this was a startling coincidence: the numbers worked out exactly perfectly to produce this result. The proportion of left-handed females was precisely the same as that of left-handed males, down to the penny. Is this really likely to occur in practice? And if not, isn’t independence theoretical as to be irrelevant? There are two ways of answering that question. The first is to admit that in real life, of course, we’re bound to get some noise in our data, just because the sample is finite and there are random fluctuations in who we happen to survey. For the same reason, if we flipped an ordinary coin 1,000 times, we aren’t likely to get exactly 500 heads. But that doesn’t mean we should rush to the conclusion that the coin is biased. Statisticians have sophisticated ways of answering this question by computing how much the experimental data needs to deviate from what we’d expect before we raise a red flag. Suffice to say here that even if the contingency table we collect isn’t picture perfect, we may still conclude that two events are independent if they’re “close enough” to independence. The other response, though, is that yes, the burden of proof is indeed on independence, rather than on non-independence. In other words, we shouldn’t start by cavalierly assuming all the events we’re considering are in fact independent, and only changing our mind if we see unexpected correlations between them. Instead, we should always be suspicious that two events will affect each other in some way, and only conclude they’re independent if the data we collect works out more or less “evenly” as in the example above. To say that Pr(A|B) is the same as Pr(A) is an aggressive statement, outside the norm, and we shouldn’t assume it without strong evidence. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 91 Context: # 4.8 EXERCISES 9. Okay, so we have the probabilities of our four swimmers Ben, Chad, Grover, and Tim each winning the heat at 1, 2, 4, and 3, respectively. Now suppose Ben, Chad, and Grover are UWM athletes; Tim is from Marymount. Ben and Tim are juniors, and Chad and Grover are sophomores. We'll define: \( U = \{ \text{Ben, Chad, Grover} \} \) \( M = \{ \text{Tim} \} \) \( J = \{ \text{Ben, Tim} \} \) \( S = \{ \text{Chad, Grover} \} \) What's \( P(U) \)? .7 10. What's \( P(J) \)? .4 11. What's \( P(U^c) \)? .3 \cdot (1 - P(U), \text{ of course.}) 12. What's \( P(J \cup S) \)? Exactly 1. All of the outcomes are represented in the two sets \( J \) and \( S \). (Put another way, all competitors are juniors or seniors.) 13. What's \( P(J \cap S) \)? Zero. Sets \( J \) and \( S \) have no elements in common; therefore their intersection is a set with no outcomes, and the probability of a non-existent outcome happening is 0. (Put another way, nobody is both a junior and a senior.) 14. What's the probability of a UWM junior winning the heat? This is \( P(J \cap U) \), which is the probability that the winner is a junior and a UWM student. Since \( U \cap J = \{ \text{Ben} \} \), the answer is 1. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 95 Context: # 5.1. GRAPHS The edges, we have precisely the structure of a graph. Psychologists have given this another name: a semantic network. It is thought that the myriad of concepts you have committed to memory — Abraham Lincoln, and bar of soap, and my fall schedule, and perhaps millions of others — are all associated in your mind in a vast semantic network that links the related concepts together. When your mind recalls information, or deduces facts, or even drifts randomly in idle moments, it’s essentially traversing this graph along the various edges that exist between vertices. That’s deep. But you don’t have to go near that deep to see the appearance of graph structures all throughout computer science. What’s MapQuest, if not a giant graph where the vertices are travelable locations and the edges are routes between them? What’s Facebook, if not a giant graph where the vertices are people and the edges are friendships? What’s the World Wide Web, if not a giant graph where the vertices are pages and the edges are hyperlinks? What’s the Internet, if not a giant graph where the vertices are computers or routers and the edges are communication links between them? This simple scheme of linked vertices is powerful enough to accommodate a whole host of applications, which is worth studying. ## Graph Terms The study of graphs brings with it a whole bevy of new terms which are important to use precisely: 1. **Vertex.** Every graph contains zero or more vertices.¹ (These are also sometimes called nodes, concepts, or objects.) 2. **Edge.** Every graph contains zero or more edges. (These are also sometimes called links, connections, associations, or relationships.) Each edge connects exactly two vertices, unless the edge connects a vertex to itself, which is possible, believe it. > ¹ The phrase "zero or more" is common in discrete math. In this case, it indicates that the empty graph, which contains no vertices at all, is still a legitimate graph. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 98 Context: # CHAPTER 5. STRUCTURES Figure 5.3 is a connected graph, whereas Figure 5.1 is not (because Civil War and Gettysburg are isolated from the other nodes). It’s not always trivial to determine whether a graph is connected; however, imagine a tangled morass of a million vertices, with ten million edges, and having to figure out whether or not every vertex is reachable from every other. (And if that seems unrealistically large, consider Facebook, which has over a billion nodes.) ## Degree A vertex’s degree is simply the number of edges that connect to it. Virginia Beach has degree 2, and Fredericksburg 3. In the case of a directed graph, we sometimes distinguish between the number of incoming arrows a vertex has (called its *in-degree*) and the number of outgoing arrows (the *out-degree*). Muhammad Ali had a higher out-degree (3) than in-degree (1) since he won most of the time. ## Cycle A cycle is a path that begins and ends at the same vertex. In Figure 5.3, Richmond-to-Virginia Beach-to-Fredericksburg-to-Richmond is a cycle. Any loop is a cycle as well. For directed graphs, the entire loop must comprise edges in the “forward” direction: no far going backwards. In Figure 5.2, Frazier-to-Ali-to-Foreman-to-Frazier is a cycle, as is the simpler Ali-to-Frazier-to-Ali. ## DAG (Directed Acyclic Graph) One common use of graphs is to represent flows of dependencies, for instance the prerequisites that different college courses have for one another. Another example is project management workflows: the tasks needed to complete a project become vertices, and then the dependencies they have on one another become edges. The graph in Figure 5.4 shows the steps in making a batch of brownies, and how these steps depend on each other. The eggs have to be cracked before the ingredients can be mixed. > We'll also say that a cycle can't repeat any edges or vertices along the way, so that it can't go back and forth repeatedly and positionally between two adjacent nodes. Some mathematicians call this a simple cycle to distinguish it from the more general cycle, but we'll just say that no cycles can repeat like this. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 101 Context: # 5.1. GRAPHS ## Relationship to Sets We seem to have strayed far afield from sets with all this graph stuff. But actually, there are some important connections to be made to those original concepts. Recall the wizards set \( A \) from chapter 3 that we extended to contain \{ Harry, Ron, Hermione, Neville \}. Now consider the following endorelation on \( A \): - (Harry, Ron) - (Ron, Harry) - (Ron, Hermione) - (Ron, Neville) - (Hermione, Hermione) - (Neville, Harry) This relation, and all it contains, is represented faithfully by the graph in Figure 5.6. The elements of \( A \) are the vertices, of course, and each ordered pair of the relation is reflected in an edge of the graph. Can you see how exactly the same information is represented by both forms? ![Figure 5.6: A graph depicting a endorelation.](path_to_image) Figure 5.6 is a directed graph, of course. What if it were an undirected graph? The answer is that the corresponding relation would be symmetric. An undirected graph implies that if there's an edge between two vertices, it goes "both ways." This is very identical to saying a relation is symmetric: if an \( (x,y) \) is in the relation, then the corresponding \( (y,x) \) must also be. An example is Figure 5.7, which depicts the following symmetric relation: ![Figure 5.7: A graph depicting a symmetric relation.](path_to_image) #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 103 Context: # 5.1 GRAPHS { Harry } and { Ron, Hermione, Neville }. In other words, the “connectedness” of a graph can be represented precisely as a partition of the set of vertices. Each connected subset is in its own group, and every vertex is in one and only one group; therefore, these isolated groups are mutually exclusive and collectively exhaustive. Cool. ## Graph Traversal If you had a long list — perhaps of phone numbers, names, or purchase orders — and you needed to go through and do something to each element of the list — dial all the numbers, scan the list for a certain name, and add up all the orders — it’d be pretty obvious how to do it. You just start at the top and work your way down. It might be tedious, but it’s not confusing. Iterating through the elements like this is called **traversing** the data structure. You want to make sure you encounter each element once (and only once) so you can do whatever needs to be done with it. It’s clear how to traverse a list. But how to traverse a graph? There is no obvious “first” or “last” node, and each one is linked to potentially many others. And as we’ve seen, the vertices might not even be fully connected, so a traversal path through all the nodes might not even exist. There are two different ways of traversing a graph: **breadth-first** and **depth-first**. They provide different ways of exploring the nodes, and as a side effect, each is able to discover whether the graph is connected or not. Let’s look at each in turn. ### Breadth-First Traversal With breadth-first traversal, we begin at a starting vertex (it doesn’t matter which one) and explore the graph cautiously and delicately. We probe equally deep in all directions, making sure we’ve looked a little ways down each possible path before exploring each of those paths a little further. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 104 Context: ```markdown # CHAPTER 5. STRUCTURES To do this, we use a very simple data structure called a **queue**. A queue is simply a list of nodes that are waiting in line. (In Britain, I'm told, instead of saying “line up” at the sandwich shop, they say “queue up.”) When we enter a node into the queue at the tail end, we call it **enqueueing** the node, and when we remove one from the front, we call it **dequeuing** the node. The nodes in the middle patiently wait their turn to be dealt with, getting closer to the front every time the front node is dequeued. An example of this data structure in action is shown in Figure 5.8. Note carefully that we always insert nodes at one end (on the right) and remove them from the other end (the left). This means that the first item to be enqueued (in this case, the triangle) will be the first to be dequeued. “Calls will be answered in the order they were received.” This fact has given rise to another name for a queue: a **FIFO**, which stands for “first-in-first-out.” ### Start with an empty queue: - Enqueue a triangle, and we have: ⬊ - Enqueue a star, and we have: ⬊⭐ - Enqueue a heart, and we have: ⬊⭐❤️ - Dequeue the triangle, and we have: ⭐❤️ - Enqueue a club, and we have: ⭐❤️♣️ - Dequeue the star, and we have: ❤️♣️ - Dequeue the heart, and we have: ♣️ - Dequeue the club. We’re empty again: **Figure 5.8:** A queue in action. The vertical bar marks the “front of the line,” and the elements are waiting to be dequeued in order from left to right. Now here’s how we use a queue to traverse a graph breadth-first. We’re going to start at a particular node and put all of its adjacent nodes into a queue. This makes them all safely “wait in line” until we get around to exploring them. Then, we repeatedly take the first node in line, do whatever we need to do with it, and then put all of its adjacent nodes in line. We keep doing this until the queue is empty. ``` #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 109 Context: # Graphs ## Depth-first traversal (DFT) 1. Choose a starting node. 2. Mark it and push it on an empty stack. 3. While the stack is not empty, do these steps: - a) Pop the top node off the stack. - b) Visit it. - c) Mark and push all of its unmarked adjacent nodes (in any order). The algorithm in action is shown in Figure 5.11. The stack really made a difference! Instead of alternately exploring Chuck's and Izzy's paths, it bullheadedly darts down Chuck's path as far as it can go, all the way to hitting Izzy's back door. Only then does it back out and visit Izzy. This is because the stack always pops off what it just pushed on, whereas whatever got pushed first has to wait until everything else is done before it gets its chance. That first couple of pushes was critical: if we had pushed Chuck before Izzy at the very beginning, then we would have explored Chuck's entire world before arriving at Chuck's back door, instead of the other way around. As it is, Izzy got put on the bottom, and so she stayed on the bottom, which is inevitable with a stack. DFT identifies disconnected graphs in the same way as BFT, and it similarly avoids getting stuck in infinite loops when it encounters cycles. The only difference is the order in which it visits the nodes. ## Finding the shortest path We'll look at two other important algorithms that involve graphs, specifically weighted graphs. The first one is called **Dijkstra's shortest-path algorithm**. This is a procedure for finding the shortest path between two nodes, if one exists. It was invented in 1956 by the legendary computer science pioneer Edsger Dijkstra. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 112 Context: # CHAPTER 5. STRUCTURES ## Dijkstra's shortest-path algorithm 1. **Choose a starting node and an ending node.** 2. **Mark the tentative distance for the starting node as 0, and all other nodes as ∞.** 3. **While there are still unvisited nodes, do these steps:** a) Identify the unvisited node with the smallest tentative distance. (If this is ∞, then you're done. All other nodes are unreachable.) Call this node the "current node." b) For each unvisited neighbor of the current node, do these steps: i. Compute the sum of the current node's tentative distance and the distance from the current node to its neighbor. ii. Compare this total to the neighbor's current tentative distance. If it's less than the current tentative distance, update the tentative distance with this new value, and mark an arrow on the path from the current node to the neighbor (erasing any other arrow to the neighbor). iii. Mark the current node as visited. (Its distance and best path are now fixed.) Don't worry, this isn't as hard as it sounds. But you do have to have your wits about you and carefully update all the numbers. Let's see it in action for WWII France. In the first frame of Figure 5.13, #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 113 Context: 5.1. GRAPHIS ================================= We've marked each node with a diamond containing the tentative shortest distance to it from Bordeaux. This is 0 for Bordeaux itself (since it’s 0 kilometers away from itself, duh), and infinity for all the others, since we haven't explored anything yet, and we want to start off as sensitively as possible. We'll update these distances to lower values as we find paths to them. We start with Bordeaux as the “current node,” marked in grey. In frame 2, we update the best-possible-path and the distance-of-that-path for each of Bordeaux’s neighbors. Nantes, we discover, is no longer “infinity away,” but a mere 150 km away, since there is a direct path to it from Bordeaux. Vicky and Toulouse are similarly updated. Note the heavy arrowed lines on the diagram, showing the best path (so far) to each of these cities from Bordeaux. Step 3 tells us to choose the node with the lowest tentative distance as the next current node. So for frame 3, Nantes fits the bill with a (tentative) distance of 150 km. It has only one unmarked neighbor, Paris, which we update with 450 km. Why 450? Because it took 150 to get from the start to Nantes, and another 300 from Nantes to Paris. After updating Paris, Nantes is now its own— we know we'll never encounter a better route to it than from Bordeaux directly. Frame 4 is our first time encountering a node that already has a non-infinite tentative distance. In this case, we don’t further update it, because our new opportunity (Bordeaux-to-Toulouse-to-Vichy) is 500 km, which is longer than going from Bordeaux to Toulouse direct. Lyon and Marseille are updated as normal. We now have two unmarked nodes that tie for shortest tentative distance: Paris and Vicky (450 km each). In this case, it doesn’t matter which one we choose. We’ll pick Vicky for no particular reason. Frame 5 then shows some interesting activity. We do not update the path to Paris; since it would be 800 km through Vichy, whereas Paris already had a much better 450 km path. Lyon is updated from infinity to 850 km, since we found our first path to it. But Lyon is the really interesting case. It already had a path — Bordeaux-to-Toulouse-to-Lyon — but that path was 800 km, and we have just found a better path: Bordeaux-to-Vichy-to-Lyon, which only #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 115 Context: # 5.1. GRAPHS How can we get all the cities connected to each other so we can safely deliver supplies between any of them, using the least possible amount of road? This isn’t just a military problem. The same issue came up in ancient Rome when aqueducts had to reach multiple cities. More recently, supplying neighborhoods and homes with power, or networking multiple computers with Ethernet cable, involves the same question. In all these cases, we’re not after the shortest route between two points. Instead, we’re sort of after the shortest route “between all the points.” We don’t care how each pair of nodes is connected, provided that they are connected. It’s the total length of the required connections that we want to minimize. To find this, we’ll use **Prim’s algorithm**, a technique named for the somewhat obscure computer scientist Robert Prim who developed it in 1957, although it had already been discovered much earlier (1930, by the Czech mathematician Vojtěch Jarník). Prim’s algorithm turns out to be much easier to carry out than Dijkstra’s algorithm, which I find surprising, since it seems to be solving a problem that’s just as hard. But here’s all you do: ## Prim’s minimal connecting edge set algorithm 1. Choose a node, any node. 2. While not all the nodes are connected, do these steps: 1. Identify the node closest to the already-connected nodes, and connect it to those nodes via the shortest edge. That’s it. Prim’s algorithm is an example of a **greedy algorithm**, which means that it always chooses the immediately obvious short-term best choice available. Non-greedy algorithms can say, “although node X would give the highest short-term satisfaction, I can look ahead and see that choosing Y instead will lead to a better overall result in the long run.” Greedy algorithms, by contrast, always gobble up what seems best at the time. That’s what Prim’s algorithm is doing in step 2a. It looks for the non-connected node. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 118 Context: ```markdown # Chapter 5: Structures ## Figure 5.14 The stages of Prim's minimal connecting edge set algorithm. Heavy lines indicate edges that have been (irrevocably) added to the set. | Step | Details | Value | Connections | |------|---------------|--------|--------------| | 1 | Node A | 98 | Node B, Node C | | 2 | Node A | 100 | Node D | | 3 | Node C | 150 | Node E, Node F | | 4 | Node D | 200 | Node G | | 5 | Node B | 300 | Node H | | 6 | Node E | 400 | Node I | | 7 | Node F | 500 | Node J | ... ``` #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 119 Context: 5.2 Trees ========= you think that the shortest path between any two nodes would land right on this Prim network? Yet if you compare Figure 5.14 with Figure 5.13 you'll see that the quickest way from Bordeaux to Strasbourg is through Marseille, not Vichy. So we end up with the remarkable fact that the shortest route between two points has nothing whatsoever to do with the shortest total distance between all points. Who knew? 5.2 Trees --------- A tree is really nothing but a simplification of a graph. There are two kinds of trees in the world: free trees, and rooted trees. ### Free trees A free tree is just a connected graph with no cycles. Every node is reachable from the others, and there’s only one way to get anywhere. Take a look at Figure 5.15. It looks just like a graph (and it is) but unlike the WWII France graph, it’s more skeletal. This is because in some sense, a free tree doesn’t contain anything “extra.” If you have a free tree, the following interesting facts are true: 1. There’s exactly one path between any two nodes. (Check it!) 2. If you remove any edge, the graph becomes disconnected. (Try it!) 3. If you add any new edge, you end up adding a cycle. (Try it!) 4. If there are n nodes, there are n - 1 edges. (Think about it!) So basically, if your goal is connecting all the nodes, and you have a free tree, you’re all set. Adding anything is redundant, and taking away anything breaks it. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 128 Context: # CHAPTER 5. STRUCTURES Finally, to traverse a tree in-order, we: 1. Treat the left child and all its descendants as a subtree, and traverse it in its entirety. 2. Visit the root. 3. Traverse the right subtree in its entirety. So instead of visiting the root first (pre-order) or last (post-order), we treat it in between our left and right children. This might seem to be a strange thing to do, but there's a method to the madness which will become clear in the next section. For the sample tree, the first visited node is B. This is because it’s the first node encountered that doesn’t have a left subtree, which means step 1 doesn’t need to do anything. This is followed by O and I, for the same reason. We then visit K before its right subtree, which in turn visits G, M, E, C, A, H, F, B, L, N. (See Figure 5.20.) If your nodes are spaced out evenly, you can read the in-order traversal of the diagram by moving your eyes left to right. Be careful about this, though, because ultimately the spatial position doesn’t matter, but rather the relationships between nodes. For instance, if I had drawn node I further to the right, in order to make the lines between D–O–I less steep, that I node might have been pushed physically to the right of K. But that wouldn't change the order and K visited earlier. Finally, it’s worth mentioning that all of these traversal methods make elegant use of recursion. Recursion is a way of taking a large problem and breaking it up into similar, but smaller, sub-problems. Then, each of those subproblems can be attacked in the same way as you attacked the larger problem: by breaking them up into subsublproblems. All you need is a rule for eventually stopping the “breaking up” process by actually doing something. Every time one of these traversal processes treats a left or right child as a subtree, they are “recursing” by re-initiating the whole traversal process on a smaller tree. Pre-order traversal, for instance, #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 135 Context: # 5.2 TREES ![Figure 5.25: NOT a binary search tree, though it looks like one at first glance. (Notice Nancy and Mitch)](path/to/image) **Possible?** The trick is to realize that with every node you look at, you effectively eliminate half of the remaining tree from consideration. For instance, if we're looking for Molly, we can disregard Mitch's entire left half without even looking at it, then the same for Randi's entire right half. If you discard half of something, then half of the remaining half, it doesn’t take you long before you’ve eliminated almost every false lead. There’s a formal way to describe this speedup, called **Big-O notation**. The subtleties are a bit complex, but the basic idea is this. When we say that an algorithm is **O(n)** (pronounced "oh-of-n"), it means that the time it takes to execute the algorithm is *proportional* to the number of nodes. This doesn’t imply any specific number of milliseconds or anything — that is highly dependent on the type of computer hardware, the programming language, and a myriad of other things. But what we can say about an **O(n)** algorithm is that if you double the number of nodes, you’re going to approximately double the running time. If you quadruple the number of nodes, you’re going to quadruple the running time. This is what you’d expect. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 142 Context: # CHAPTER 5. STRUCTURES Thirdly, there are specialized data structures you may learn about in future courses, such as AVL trees and red-black trees, which are binary search trees that add extra rules to prevent imbalancing. Basically, the idea is that when a node is inserted (or removed), certain metrics are checked to make sure that the change didn’t cause too great an imbalance. If it did, the tree is adjusted to minimize the imbalance. This comes at a slight cost every time the tree is changed, but prevents any possibility of a lopsided tree that would cause slow lookups in the long run. ## 5.3 Final Word Whew, that was a lot of information about structures. Before we continue our walk in the next chapter with a completely different topic, I’ll leave you with this summary thought. Let \( BST \) be the set of Binary Search Trees, and \( BT \) be the set of Binary Trees. Let \( RT \) be the set of rooted trees, and \( T \) be the set of trees (free or rooted). Finally, let \( CG \) be the set of connected graphs, and \( G \) the set of all graphs. Then we have: \[ BST \subseteq BT \subseteq RT \subseteq CG \subseteq G \] It’s a beautiful thing. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 145 Context: # 5.4. EXERCISES 17. If this graph represented an endorelation, how many ordered pairs would it have? **Answer:** 10. 18. Suppose we traversed the graph below in depth-first fashion, starting with node P. In what order would we visit the nodes? ![Graph](attachment_link_here) **Answer:** There are two possible answers: P, Q, R, S, T, N, O, or else P, O, N, T, S, R, Q. (The choice just depends on whether we go "left" or "right" initially.) Note in particular that either O or Q is at the very end of the list. 19. Now we traverse the same graph breadth-first fashion, starting with node P. Now in what order would we visit the nodes? **Answer:** Again, two possible answers: P, O, Q, N, R, T, S, or else P, Q, R, N, S, T. Note in particular that both O and Q are visited very early. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 149 Context: # Chapter 6 ## Counting If the title of this chapter seems less than inspiring, it’s only because the kind of counting we learned as children was mostly of a straightforward kind. In this chapter, we’re going to learn to answer some more difficult questions like “how many different semester schedules could a college student possibly have?” and “how many different passwords can a customer choose for this e-commerce website?” and “how likely is this network buffer to overflow, given that its packets are addressed to three different destinations?” The more impressive-sounding name for this topic is **combinatorics**. In combinatorics, we focus on two tasks: counting things (to find out how many there are), and enumerating things (to systematically list them as individuals). Some things turn out to be hard to count but easy to enumerate, and vice versa. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 150 Context: # 6.1 The Fundamental Theorem We start with a basic rule that goes by the audacious name of **The Fundamental Theorem of Counting**. It goes like this: If a whole can be divided into \( k \) parts, and there’s \( n_i \) choices for the \( i \)-th part, then there’s \( n_1 \times n_2 \times n_3 \times \cdots \times n_k \) ways of doing the whole thing. ## Example Jane is ordering a new Lamborghini. She has twelve different paint colors to choose from (including Luscious Red and Sassy Yellow), three different interiors (Premium Leather, Bonded Leather, or Vinyl), and three different stereo systems. She must also choose between automatic and manual transmission, and she can get power locks & windows (or not). How many different configurations does Jane have to choose from? To put it another way, how many different kinds of cars could come off the line for her? The key is that every one of her choices is independent of all the others. Choosing an Envious Green exterior doesn't constrain her choice of transmission, stereo, or anything else. So no matter which of the 12 paint colors she chooses, she can independently choose any of the three interiors, and no matter what these first two choices are, she can freely choose any of the stereos, etc. It’s a mix-and-match. Therefore, the answer is: \[ 12 \times 3 \times 3 \times 2 = 432 \] Here’s an alternate notation you’ll run into for this, by the way: **How many other "Fundamental Theorems" of math do you know?** Here are a few: the Fundamental Theorem of Arithmetic says that any natural number can be broken down into its prime factors in only one way. The Fundamental Theorem of Algebra says that the highest power of a polynomial is how many roots (zeros) it has. The Fundamental Theorem of Linear Algebra says that the row space and the column space of a matrix have the same dimension. The Fundamental Theorem of Calculus says that integration and differentiation are the inverses of each other. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 151 Context: # 6.1. THE FUNDAMENTAL THEOREM which is just a shorter way of writing \[ n_1 \times n_2 \times n_3 \cdots \times n_k. \] As mentioned in section 4.5, the \( \Sigma \) notation is essentially a loop with a counter, and it says to add up the expression to the right of it for each value of the counter. The \( \Pi \) notation is exactly the same, only instead of adding the expressions together for each value of the counter, we're multiplying them. (The reason mathematicians chose the symbols \( \Sigma \) (sigma) and \( \Pi \) (pi) for this, by the way, is that “sigma” and “pi” start with the same letter as “sum” and “product,” respectively.) We can actually get a lot of leverage just with the fundamental theorem. How many different PINs are possible for an ATM card? There are four digits, each of which can be any value from 0 to 9 (ten total values), so the answer is: \[ 10 \times 10 \times 10 \times 10 = 10,000 \text{ different PINs.} \] So a thief at an ATM machine frantically entering PINs at random (hoping to break your account before you call and stop your debit card) would have to try about 5,000 of them on average before cracking the code. What about middle school bullies who are trying to break into your locker? Well, most combination locks are opened by a three-number sequence, each number of which is anything from 0 to 39. So there are: \[ 40 \times 40 \times 40 = 64,000 \text{ different combinations.} \] That's probably slightly overstated, since I'll bet consecutive repeat numbers are not allowed (Master probably doesn't manufacture a #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 156 Context: # CHAPTER 6. COUNTING This pattern comes up so much that mathematicians have established a special notation for it: \[ n \times (n - 1) \times (n - 2) \cdots \times 1 = n! \; (\text{``n-factorial''}) \] We say there are “3-factorial” different brushing orders for the Davies kids. For our purposes, the notion of factorial will only apply for integers, so there’s no such thing as 23.46! or π! (In advanced computer science applications, however, mathematicians sometimes do define factorial for non-integers.) We also define 0! to be 1, which might surprise you. This comes up a heck of a lot. If I give you a jumbled set of letters to unscramble, like “KRBS” (think of the Jumble® word game in the newspaper), how many different unscramblings are there? The answer is 5! or 120, one of which is BRISK. Let's say I shuffle a deck of cards before playing War. How many different games of War are there? The answer is 52!, since any of the cards in the deck might be shuffled on top, then any but that top card could be second, then any of those two could be third, etc. Ten packets arrive near-simultaneously at a network router. How many ways can they be queued up for transmission? 10! ways, just like a larger Davies family. The factorial function grows really, really fast, by the way, even faster than exponential functions. A five letter word like “BRISK” has 120 permutations, but “AMBIENTDROUSLY” has 87,178,291,200, ten times the population of the earth. The number of ways to shuffle a deck is \[ 52! \approx 8.065 \times 10^{67} \] so I don’t think my boys will end up playing the same War game twice any time soon, nor my wife and I the same bridge hand. --- **“War”** is a mindless card game which involves no strategy or decision-making on the part of the players. Once you shuffle the initial deck, the entire outcome of the game is fixed. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 158 Context: # CHAPTER 6. COUNTING and so on. Once we have the **RISK** permutations, we can generate the **BRISK** permutations in the same way: | B | R | I | S | K | |---|---|---|---|---| | R | B | I | S | K | | R | I | B | S | K | | R | I | S | B | K | | B | I | R | S | K | | T | B | R | I | S | K | | I | R | B | S | K | | I | R | S | K | | I | S | R | K | | B | S | K | | . . . | Another algorithm to achieve the same goal (though in a different order) is as follows: ## Algorithm #2 for enumerating permutations 1. Begin with a set of **n** objects. a. If **n = 1**, there is only one permutation; namely, the object itself. b. Otherwise, remove each of the objects in turn, and prepend that object to the permutations of all the others, creating another permutation each time. I find this one a little easier to get my head around, but in the end it’s personal preference. The permutations of **BRISK** are: “B followed by all the permutations of **RISK**, plus **I** followed by all the permutations of **BISK**, plus **R** followed by all the permutations of **BRSK**, etc.” So the first few permutations of a 4-letter word are: - R I S K - R I K S - R S I K - R S K I - R K I S - R K S I - I R S K - I R K S - I S R K - I S K R - I K R S - I K S R - S R I K - S R K I - S I R K - S I K R - S K R I - S K I R - K R I S - K R S I - K I R S - K I S R - K S R I - K S I R #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 160 Context: # 152 ## CHAPTER 6. COUNTING 10. Bubba, then it doesn’t matter whether Jason finished 11th or 45th. It’s easy to see that there are 45 possible winners; then for each winner there are 44 possible second-placers, etc., so this total turns out to be: 45×44×43×42×41×40×39×38×37×36 = 11,576,551,623,436,800 finishes. Each of the finishes is called a *partial permutation*. It’s a permutation of k items chosen from n total, and is denoted P(n, k). The number of such permutations works out to: $$ n \times (n - 1) \times (n - 2) \times \cdots \times (n - k + 1) $$ The “n - k + 1” bit can be confusing, so take your time and think it through. For the golf tournament case, our highest term was 45 and our lowest term was 36. This is because n was 45 and k was 10, and so we only wanted to carry out the multiplication to 36 (not 35), and 36 is 45 - 10 + 1. This can be expressed more compactly in a few different ways. First, we can use factorials to represent it: $$ n \times (n - 1) \times (n - 2) \times \cdots \times (n - k + 1) = \frac{(n - k)!}{(n - k)!} \times (n - k) \times (n - k - 1) \times (n - k - 2) \times \cdots \times 1 = \frac{n!}{(n - k)!} $$ Also, we could use our compact product notation: $$ n \times (n - 1) \times (n - 2) \times \cdots \times (n - k + 1) = \prod_{i=0}^{k-1} (n - i). $$ Finally, as with (non-partial) permutations, this comes up so much that the professionals have invented a special notation for it. It looks like a power, but has an underline under the exponent: $$ n \times (n - 1) \times (n - 2) \times \cdots \times (n - k + 1) = n^{\underline{k}}. $$ #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 161 Context: # 6.2. PERMUTATIONS This is pronounced “n-to-the-k-falling,” and was invented by one of the most brilliant computer scientists in history, Donald Knuth. To keep straight what \( n^k \) means, think of it as the same as plain exponentiation, except that the product diminishes instead of staying the same. For example, “17-to-the-6th” is \[ 17^6 = 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17 \] but “17-to-the-6th-falling” is \[ 17^4 = 17 \cdot 16 \cdot 15 \cdot 14 \cdot 13 \cdot 12. \] In both cases, you’re multiplying the same number of terms; it’s just that in the second case, these terms are “falling.” Anyway, notation aside, partial permutations abound in practice. A late night movie channel might show four classic films back to back every evening. If there are 500 films in the studio’s library, how many nightly TV schedules are possible? Answer: \( 500^4 \) since there are 500 choices of what to show at 7 pm, then 499 choices for 9 pm, 498 for 11 pm, and 497 for the 1 am late show. The fastest 41 auto racers will qualify for Sunday’s race, and will be placed from Pole Position on down depending on their qualifying time. If 60 cars participate in the qualifying heat, then there are \( 60! \) different possible starting configurations for Sunday. Middle schoolers entering sixth grade will be assigned a semester schedule that consists of five “blocks” (periods), each of which will have one of thirteen classes (science, math, orchestra, study hall, etc.). How many schedules are possible? You guessed it, \( 13^5 \). Notice that this is the correct answer only because no repeats are allowed: we don’t want to schedule any student for American History more than once. If a student could take the same class more than once in a day, then there would be \( 13^5 \) (not “falling”) different possible schedules. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 163 Context: # 6.3. COMBINATIONS 1. Phil 2. Bubba 3. Tiger Another one is: 1. Phil 2. Tiger 3. Bubba and yet another is: 1. Tiger 2. Phil 3. Bubba Now the important thing to recognize is that in our present problem — counting the possible number of regional-bound golf trios — all three of these different partial permutations represent the same combination. In all three cases, it's Bubba, Phil, and Tiger who will represent our local golf association in the regional competition. So by counting all three of them as separate partial permutations, we've overcounted the combinations. Obviously, we want to count Bubba/Phil/Tiger only once. Okay then. How many times did we overcount it when we counted partial permutations? The answer is that we counted this trio once for every way it can be permuted. The three permutations, above, were examples of this, and so are these three: 1. Tiger 2. Bubba 3. Phil 1. Bubba 2. Tiger 3. Phil 1. Bubba 2. Phil 3. Tiger This makes a total of six times that we (redundantly) counted the same combination when we counted the partial permutations. Why 6? Because that's the value of 3!, of course. There are 3! different ways to arrange Bubba, Phil, and Tiger, since that's just a straight permutation of three elements. And so we find that every threesome we want to account for, we have counted 6 times. The way to get the correct answer, then, is obviously to correct for this overcounting by dividing by 6: \[ \frac{45 \times 44 \times 43}{3!} = \frac{45 \times 44 \times 43}{6} = 14,190 \text{ different threesomes.} \] #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 165 Context: ``` # 6.3. COMBINATIONS The coefficients for this binomial are of course 1 and 1, since "n" really means "1 . n." Now if we multiply this by itself, we get: $ (x + y) \cdot (x + y) = 2x^2 + 2xy + y^2 $, the coefficients of the terms being 1, 2, and 1. We do it again: $ (x^2 + 2xy + y^2) \cdot (x + y) = x^3 + 3x^2y + 3xy^2 $ to get 1, 3, 3, and 1, and do it again: $ (x^3 + 3x^2y + 3xy^2 + y^3) \cdot (x + y) = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4 $ to get 1, 4, 6, and 4. At this point you might be having flashbacks to Pascal's triangle, which perhaps you learned about in grade school, in which each entry in a row is the sum of the two entries immediately above it (to the left and right), as in Figure 6.1. (If you never learned that, don't worry about it.) ``` 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 ``` **Figure 6.1:** The first six rows of Pascal's triangle. Now you might be wondering where I'm going with this. What do fun algebra tricks have to do with counting combinations of items? The answer is that the values of $ \binom{n}{k} $ are precisely the coefficients of these multiplied polynomials. Let n be 4, which corresponds to the last polynomial we multiplied out. We can then compute all the combinations of items taken from a group of four: - $ \binom{4}{0} = 1 $ - $ \binom{4}{1} = 4 $ - $ \binom{4}{2} = 6 $ - $ \binom{4}{3} = 4 $ - $ \binom{4}{4} = 1 $ In other words, there is exactly **one** way of taking no items out of 4 (you simply don't take any). There are **four** ways of taking one out of 4. ``` #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 168 Context: # CHAPTER 6. COUNTING We each do a quick 30-minute workout session divided into six 5-minute blocks, and we work with one of the machines during each block, taking turns spotting each other. One day my friend asks me, “Hey Stephen, have you ever wondered: how many different workout routines are possible for us?” I was, of course, wondering exactly that. But the correct answer turns out to hinge very delicately on exactly what “a workout routine” is. If we could select any weight machine for any 5-minute block, then the answer is \( 6^{18} \), since we have 18 choices for our first block, 18 choices for our second, and so on. (This comes to 34,012,224 different routines, if you’re interested.) However, on further inspection, we might change our mind about this. Does it make sense to choose the same machine more than once in a 30-minute workout? Would we really complete a workout that consisted of “1. Biceps, 2. Abs, 3. Pec, 4. Biceps, 5. Biceps, 6. Biceps?” If not (and most trainers would probably recommend against such monomanical approaches to exercise), then the real answer is only \( 6^{18} \), since we have 18 choices for our first block, and then only 17 for the second, 16 for the third, etc. (This reduces the total to 13,360,080.) But perhaps the phrase “a workout routine” means something different even than that. If I tell my physical therapist what “my workout routine” consisted of this morning, does he really care whether I did triceps first, last, or in the middle? He probably only cares about which machines (and therefore which muscle groups) I worked out that morning, not what order I did them in. If this is true, then our definition of a workout routine is somewhat different than before. It’s no longer a consecutive sequence of machines chosen, but rather a set of six machine choices. There would only \( \binom{6}{3} \) of those, or a mere 18,564. So as you can see, the answer radically depends on the precise interpretation of the concepts, which means that to successfully do combinatorics, you have to slow down and think very carefully. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 171 Context: # 6.5. EXERCISES 1. To encourage rivalry and gluttony, we're going to give a special certificate to the child who collects the most candy at the end of the night. And while we're at it, we'll give 2nd-place and 3rd-place certificates as well. How many different ways could our 1st-2nd-3rd contest turn out? **Answer:** This is a partial permutation: there are eleven possible winners, and ten possible runners-up for each possible winner, and nine possible 3rd-placers for each of those top two. The answer is therefore 114, or 900. Wow! I wouldn’t have guessed that high. 2. Finally, what if we want every kid to get a certificate with their name and place-of-finish on it? How many possibilities? (Assume no ties.) **Answer:** This is now a full-blown permutation: 111. It comes to 30,916,800 different orders-of-finish, believe it or not. I told you: this counting stuff can explode fast. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 173 Context: # Chapter 7 ## Numbers Wow, last chapter was about “counting,” and this one is about “numbers.” It sure seems like we’re regressing back to first grade or earlier. And indeed, this chapter will contain a repeat of some elementary school concepts! But this is so we can re-examine the foundations and generalize them somewhat. The mechanical processes you’ve always used with numbers — adding, subtracting, comparing, checking whether something divides evenly, working with place value — are all correct, but they’re all hard-coded for decimal numbers. The word “decimal,” in this chapter, won’t mean “a number with a decimal point,” like 5.62* but rather a number expressed in base 10. And what does “expressed in base 10” mean? It means that the digits, from right to left, represent a “one’s place,” a “ten’s place,” a “hundred’s place,” and so on. This is what we all learned in grade school, and perhaps you thought that’s just how numbers “were.” But it turns out that 1, 10, 1000, …, is just one choice of place values, and that we could equally as well choose many other things, like 1, 2, 4, 8, …, or 1, 16, 256, 4096, …, or even 1, 23, 529, 12167, …, as long as those values are of a certain type (successive powers of the base). It’s the concept of bases, and specifically bases other than 10, that will cause us to rethink some things. It’ll feel unnatural at first, but soon you’ll discover that there are aspects of how you work with numbers that are unnecessarily specific, and that it’s freeing. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 175 Context: # 7.1. WHAT IS A “NUMBER?” When you think of a number, I want you to try to erase the sequence of digits from your mind. Think of a number as what is: a **quantity**. Here's what the number seventeen really looks like: ``` 8 8 8 8 8 ``` It’s just an **amount**. There are more circles in that picture than in some pictures, and less than in others. But in no way is it “two digits,” nor do the particular digits “1” and “7” come into play any more or less than any other digits. Let’s keep thinking about this. Consider this number, which I’ll label “A”: (A) Now let’s add another circle to it, creating a different number I’ll call “B”: (B) And finally, we’ll do it one more time to get “C”: (C) (Look carefully at those images and convince yourself that I added one circle each time.) When going from A to B, I added one circle. When going from B to C, I also added one circle. Now I ask you: was going from B to C any more “significant” than going from A to B? Did anything qualitatively different happen? The answer is obviously no. Adding a circle is adding a circle; there’s nothing more to it than that. But if you had been writing #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 200 Context: # 7.5 Exercises 1. If I told you that the decimal number (i.e., base-10 number) 2022 was equal to 13612, would you call me a liar without even having to think too hard? Yes, you should. Think about it: in base 6, each digit's place value (except the one's place) is worth less than it is in base 10. Instead of a ten's place, hundred's place, and thousand's place, we have a wavy six's place, thirty's place, and two-hundred-and-sixteen's place. So there's no way that a number whose base-6 digits are 1, 4, 1, and 3 would be as large as a number whose base-10 digits are 2, something, something, and something. Put another way, if the base is smaller, the number itself has to "look bigger" to have a chance of evening that out. 2. If I told you that the decimal number 2022 was equal to 1413, would you call me a liar without even having to think too hard? Yes, you should. Think about it: in base 6, each digit's place value (except the one's place) is worth less than it is in base 10. Instead of a ten's place, hundred's place, and thousand's place, we have a wavy six's place, thirty's place, and two-hundred-and-sixteen's place. So there's no way that a number whose base-6 digits are 1, 4, 1, and 3 would be as large as a number whose base-10 digits are 2, something, something, and something. Put another way, if the base is smaller, the number itself has to "look bigger" to have a chance of evening that out. 3. If I told you that the decimal number 2022 was equal to 5FA86, would you call me a liar without even having to think too hard? Yes, you should, because of the mirror reflection of the above logic. Every digit of a hexadecimal number (again, except the one's place) is worth more than it is in base 10. So a four-digit hex number beginning with an 8 is going to be way bigger than a wimpy four-digit decimal number beginning with 2. 4. If I told you that the decimal number 2022 was equal to 1230, would you call me a liar without even having to think too hard? No, you shouldn't, because you do have to think hard for this one. As it happens, I am a liar (the true answer is 12310), but there's no easy way to know that at a glance. 5. If I told you that the decimal number 2022 was equal to 7E6, would you call me a liar without even having to think too hard? No, you shouldn't, because you do have to think hard for this one. (And in fact, it's correct! Work it out.) #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 205 Context: # Chapter 8 ## Logic To a great extent, logic governs the way your mind works, even among so-called “irrational people.” If we want to capture logical processes and represent them in a computer program, we need a way to express these thoughts in a form suitable for automated reasoning. This is primarily why computer scientists study logic. Interestingly, the material in this chapter covers the very bottom and the very top of the technology stack. At the bottom, we have actual physical hardware that consists of circuits turning bits on and off. The rules that govern when we want to turn which bits on and off are based on “logic gates,” or tiny physical devices that implement the logical principles of this chapter on a micro scale. At the other end of the spectrum, we have highly abstract programs aiming towards “artificial intelligence.” These systems are centered around a “knowledge base” of accumulated facts, and regularly examine those known facts to make decisions and draw additional conclusions. What does a knowledge base consist of? You guessed it: logical statements that are described in this chapter. ### 8.1 Propositional logic The simpler — but less powerful — of the two logic systems we’ll study is called **propositional logic**. It has this name because the core building block is the **proposition**. A proposition is simply a #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 219 Context: # 8.2. Predicate Logic which is perfectly true[^1]. You may recall the word "predicate" from your middle school grammar class. Every sentence, remember, has a subject and a predicate. In “Billy jumps,” “Billy” is the subject, and “jumps” the predicate. In “The lonely boy ate spaghetti with gusto,” we have “the lonely boy” as the subject and “ate spaghetti with gusto” as the predicate. Basically, a predicate is anything that can describe or affirm something about a subject. Imagine asserting “JUMPS(Billy)” and “ATESPAGHETTIWITHGUSTO(lonely boy).” A predicate can have more than one input. Suppose we define the predicate `IsFanOf` as follows: Let `IsFanOf(x, y)` be the proposition that x digs the music of rock band y. Then I can assert: - `IsFanOf(Stephen, Led Zeppelin)` - `IsFanOf(Rachel, The Beatles)` - `IsFanOf(Stephen, The Beatles)` - `¬IsFanOf(Stephen, The Rolling Stones)` We could even define `TraveledToByModeInYear` with a bunch of inputs: Let `TraveledToByModeInYear(p, d, m, y)` be the proposition that person p traveled to destination d by mode m in year y. The following statements are then true: - `TraveledToByModeInYear(Stephen, Richmond, car, 2017)` > "By the way, when I say you can give any input at all to a predicate, I mean any individual element from the domain of discourse. I don’t mean that a set of elements can be an input. This limitation is why it’s called ‘first-order’ predicate logic. If you allow sets to be inputs to predicates, it’s called ‘second-order predicate logic’, and can get quite messy." [^1]: The footnote text if needed. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 224 Context: # CHAPTER 8. LOGIC ## Order matters When you're facing an intimidating morass of \( \forall \)s and \( \exists \)s and \( \vee \)s and \( \wedge \)s, and God knows what else, it's easy to get lost in the sauce. But you have to be very careful to dissect the expression to find out what it means. Consider this one: \[ \forall x \in \mathbb{R}, \exists y \in \mathbb{R} \quad y = x + 1. \] (8.5) This statement is true. It says that for every single real number (call it \( x \)), it's true that you can find some other number (call it \( y \)) that's one greater than it. If you generate some examples it's easy to see this is true. Suppose we have the real number \( x = 5 \). Is there some other number \( y \) that's equal to \( 2 + x \)? Of course, the number 6. What if \( x = -32.4 \)? Is there a number \( y \) that satisfies this equation? Of course, \( y = -31.4 \). Obviously not matter what number \( x \) we choose, we can find the desired number \( y \) just by adding one. Hence this statement is true for all \( x \), just like it says. What happens, though, if we innocently switch the order of the quantifiers? Let's try asserting this: \[ \exists y \in \mathbb{R}, \forall x \in \mathbb{R} \quad y = x + 1. \] (8.6) Is this also true? Look carefully. It says “there exists some magic number \( y \) that has the following amazing property: no matter what value of \( x \) you choose, this \( y \) is one greater than \( x \)!” Obviously this is not true. There is no such number \( y \). If I choose \( y = 13 \), that works great as long as I choose \( x = 12 \), but for any other choice of \( x \), it’s dead in the water. ## The value of precision This fluency with the basic syntax and meaning of predicate logic was our only goal in this chapter. There are all kinds of logical rules that can be applied to predicate logic statements in order to #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 226 Context: # Chapter 8: Logic Here's a funny one I’ll end with. Consider the sentence **“He made her duck.”** What is intended here? Did some guy reach out with his hand and forcefully push a woman’s head down out of the way of a screaming projectile? Or did he prepare a succulent dish of roasted fowl to celebrate her birthday? Oh, if the computer could only know. If we’d used predicate logic instead of English, it could! #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 228 Context: 220 # CHAPTER 8. LOGIC 21. What’s \( C \rightarrow R \)? True. (The premise is false, so all bets are off and the sentence is true.) 22. What’s \( C \leftrightarrow B \)? False. (The truth values of the left and right sides are not the same.) 23. What’s \( C \leftrightarrow B \)? True. (The truth values of the left and right sides are the same.) 24. I make this assertion: \(\neg X \land \neg (Z \rightarrow Q)\). And since I'm the professor, you can assume I'm correct about this. From this information alone, can you determine a unique set of values for the four variables? Or is there more than one possibility for them? 25. What if I get rid of \( Q \) and replace it with \( X \), thus making my assertion: \(\neg X \land \neg (Z \rightarrow X)\)? Now what are/jare the solutions? 26. At the time of this writing, all professors are human, and that’s what I’ll be assuming in these exercises. 27. True or false: \(\forall X \, \text{Professor}(X)\). False. This says “everyone and everything is a professor,” which is clearly not true. (Consider what you have in front of you as a counterexample.) 28. True or false: \(\forall X \, \text{Human}(X)\). False. This says “everyone and everything is human,” which is clearly not true. (Consider the book in front of you as a counterexample.) #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 231 Context: # Chapter 9 ## Proof We've seen a lot of pretty sights on our cool brisk walk. We've caught a glimpse of the simple elegance of sets and relations, the precision of probabilistic reasoning, the recursive structure of trees, the explosive nature of combinatorics, and much more. None of these things have we plumbed to the depths, but we've appreciated their beauty and taken note of where they stood along our blazed trail. You’ll remember this hike when you run into such concepts again and again in future computer science and math courses, and in your career beyond academics. Now we have one more step to make before returning to the trailhead, and that deals with the notion of **proof**. As we've studied these various mathematical entities, I’ve pointed out certain of their properties. A free tree has one more vertex than edge, for example. The cardinality of the union of two sets is at least as big as each of their individual unions. If you flip-all-the-bits-and-add-one in a two's complement scheme, and then perform that flip-and-add operation again, you’ll return to the original number. But with a few exceptions, we haven't proven any of these things. I’ve just stated them, and you’ve taken them on faith. In order to establish reliable truth, of course, professional mathematicians aren’t satisfied with unsubstantiated statements. They need to be convinced that the claims we make do truly hold, and provably so, in all circumstances. What they seek is a **proof** of a... #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 234 Context: # CHAPTER 9. PROOF That proposition B must be false. What else, if anything, can now be safely concluded from this? It turns out that we can now conclude that F is also false. How do we know this? Here’s how: 1. Fact #4 says that either A or B (or both) is true. But we just discovered that B was false. So if it ain't B, it must be A, and therefore we conclude that A must be true. (For the curious, this rule of common sense is called a “disjunctive syllogism.”) 2. Now if A is true, we know that C must also be true, because fact #1 says that A implies C. So we conclude that C is true. (This one goes by the Latin phrase “modus ponens.”) 3. Fact #2 says \( C \land D \) must be false. But we just found out that C was true, so it must be D that’s false in order to make the conjunction false. So we conclude that D is false. (This is a disjunctive syllogism in disguise, combined with De Morgan’s law.) 4. Finally, fact #3 tells us that if either F were true or E were false, then that would imply that D would be true. But we just found out that D is false. Therefore, neither F nor ¬E can be true. (This step combines “modus tollens” with “disjunction elimination.”) So we conclude that F must be false. Q.E.D. (The letters “Q.E.D.” at the end of a proof stand for a Latin phrase meaning, “we just proved what we set out to prove.” It’s kind of a way to flex your muscles as you announce that you’ve done it.) The things we’re allowed to start with are called axioms (or postulates). An axiom is a presupposition or definition that is given. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 236 Context: # CHAPTER 9. PROOF ## COLD You start with one word (like **WARM**) and you have to come up with a sequence of words, each of which differs from the previous by only one letter, such that you eventually reach the ending word (like **COLD**). It's sort of like feeling around in the dark: - **WARM** - **WART** - **WALT** - **WILT** - **WILD** - ... This attempt seemed promising at first, but now it looks like it's going nowhere. ("**WOLD?**" "**CILD?**" Hmm...) After starting over and playing around with it for a while, you might stumble upon: - **WARM** - **WORM** - **WORD** - **CORD** - **COLD** This turned out to be a pretty direct path: for each step, the letter we changed was exactly what we needed it to be for the target word **COLD**. Sometimes, though, you have to meander away from the target a little bit to find a solution, like going from **BLACK** to **WHITE**: - **BLACK** - **CLACK** - **CRACK** - **TRACK** - **TRICK** - **TRICE** #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 237 Context: # 9.2. TYPES OF PROOF ## TRITE ## WRITE ## WHITE Here, we had to temporarily change our first letter three different times — two of which seemingly brought us no nearer to WHITE — in order to successfully forge a path through the tangled forest. Knowing which direction to set out on is a matter of intuition plus trial and error. Given the axioms of any system (whether algebra, predicate logic, sets, etc.), there are an unfathomable number of different ways to proceed. The vast majority of them are bound to lead to dead ends. This is why a valid proof, when it is finished, is often an elegant and beautiful thing. It’s a thin braid of jewels glistening in the midst of a whole lot of mud. ## Indirect proof Also known as a proof by contradiction or reductio ad absurdum, the indirect proof starts in a completely opposite way. It says, “Okay, I’m trying to prove X. Well, suppose for the sake of argument I assume that the opposite — not X — is true. Where would that lead me?” If you follow all the rules and it leads you to a contradiction, this tells you that the original assumption of ¬X must have been false. And this in turn proves that X must be true. We do this all the time in our thinking. Say you’re driving down the highway. How do you know that the alternator in your engine is working? A direct proof would require that you open the hood and examine the part, testing to ensure it works properly. An indirect proof simply says, “Well, suppose it weren’t working properly. Then my car engine wouldn’t operate. But here I am, driving down the road, and the engine obviously does operate, so that tells me that the alternator must be working properly.” One of the most famous indirect proof dates from Euclid’s *Elements* in 300 B.C. It proves that the square root of 2 is an irrational number, a great surprise to mathematicians at the time (most of whom doubted the very existence of irrational numbers). Remember... #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 238 Context: 230 # CHAPTER 9. PROOF An irrational number is one that **cannot** be expressed as the ratio of two integers, no matter what the integers are. Proving this directly seems pretty hard, since how do you prove that there aren't any two integers whose ratio is \(\sqrt{2}\), no matter how hard you looked? I mean, \(534,927\) and \(378,250\) are pretty dang close: \[ \left( \frac{534927}{378250} \right)^2 = 2.000005. \] How could we possibly prove that no matter how hard we look, we can never find a pair that will give it to us exactly? One way is to assume that \(\sqrt{2}\) is a rational number, and then prove that down that path lies madness. It goes like this: Suppose \(\sqrt{2}\) is rational, after all. That means that there must be two integers, call them \(a\) and \(b\), whose ratio is exactly equal to \(\sqrt{2}\): \[ \frac{a}{b} = \sqrt{2}. \] This, then, is the starting point for our indirect proof. We're going to proceed under this assumption and see where it leads us. By the way, it's clear that we could always reduce this fraction to lowest terms in case it's not already. For instance, if \(a = 6\) and \(b = 4\), then our fraction would be \(\frac{3}{2}\), which is the same as \(\frac{6}{4}\), so we could just say \(a = 3\) and \(b = 2\) and start over. Bottom line: if \(\sqrt{2}\) is rational, then we can find two integers \(a\) and \(b\) that have no common factor (if they do have a common factor, we'll just divide it out of both of them and go with the new numbers) whose ratio is \(\sqrt{2}\). Okay then. But now look what happens. Suppose we square both: #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 240 Context: 232 # CHAPTER 9. PROOF “recursion,” because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.116). A tree can be thought of as a node with several children—each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.16 on p.113, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs. Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron. ## Casting the problem in the right form Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.) Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let VOTE(n) be the proposition that a citizen of age n can vote.” If I want to prove an algebraic identity, like \[ \sum_{i=1}^{n} \frac{i}{n+1} = \frac{n(n+1)}{2}, \] then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \( n \) variable in my equation. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 244 Context: # Example 2 A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher's astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \( 50 \times 101 = 5050 \). In general, if you add the numbers from 1 to \( x \), where \( x \) is any integer at all, you'll get \( \frac{x(x + 1)}{2} \). Now, use mathematical induction to prove that Gauss was right (i.e., that \( \sum_{i=1}^{x} i = \frac{x(x + 1)}{2} \)) for all numbers \( x \). First, we have to cast our problem as a predicate about natural numbers. This is easy: say “let \( P(n) \) be the proposition that \[ \sum_{i=1}^{n} i = \frac{n(n + 1)}{2} \] Then, we satisfy the requirements of induction: 1. **Base Case**: We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have: \[ \sum_{i=1}^{1} i = \frac{1(1 + 1)}{2} \] \[ 1 = \frac{1(2)}{2} \] \[ 1 = 1 \quad \text{✓} \] 2. **Inductive Step**: We now must prove that \( P(k) \) implies \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 251 Context: # 9.3. PROOF BY INDUCTION ## 2. Inductive Step We now must prove that \((\forall i \leq k) P(i) \Rightarrow P(k + 1)\). Put another way, we assume that all trees smaller than the one we’re looking at have one more node than edge, and then use that assumption to prove that the tree we’re looking at also has one more node than edge. We proceed as follows. Take any free tree with \(k + 1\) nodes. Removing any edge gives you two free trees, each with \(k\) nodes or less. (Why? Well, if you remove any edge from a free tree, the nodes will no longer be connected, since a free tree is “minimally connected” as it is. And we can't break it into more than two trees by removing a single edge, since the edge connects exactly two nodes and each group of nodes on the other side of the removed edge are still connected to each other.) Now the sum of the nodes in these two smaller trees is still \(k + 1\). (This is because we haven't removed any nodes from the original free tree — we've simply removed an edge.) If we let \(k_1\) be the number of nodes in the first tree, and \(k_2\) the number of nodes in the second, we have \(k_1 + k_2 = k + 1\). Okay, but how many edges does the first tree have? Answer: \(k_1 - 1\). How do we know that? By the inductive hypothesis. We’re assuming that any tree smaller than \(k + 1\) nodes has one less edge than node, and so we’re taking advantage of that (legal) assumption here. Similarly, the second tree has \(k_2 - 1\) edges. Bingo. Removing one edge from our original tree of \(k + 1\) nodes gave us a total of \(k - 1\) edges. Therefore, that original tree must have had \(k\) edges. We have now proven that a tree of \(k + 1\) nodes has \(k\) edges, assuming that all smaller trees also have one less edge than node. ## 3. Conclusion Therefore, by the strong form of mathematical induction, \(\forall n \geq 1, P(n)\). #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 252 Context: # 9.4 Final Word Finding proofs is an art. In some ways, it’s like programming: you have a set of building blocks, each one defined very precisely, and your goal is to figure out how to assemble those blocks into a structure that starts with only axioms and ends with your conclusion. It takes skill, patience, practice, and sometimes a little bit of luck. Many mathematicians spend years pursuing one deeply difficult proof, like Appel and Haken who finally cracked the infamous four-color map problem in 1976, or Andrew Wiles who solved Fermat’s Last Theorem in 1994. Some famous mathematical properties may never have proofs, such as Christian Goldbach’s 1742 conjecture that every even integer is the sum of two primes; or the most elusive and important question in computing theory: does P=NP? (Put very simply: if you consider the class of problems where it’s easy to verify a solution once you have it, but crazy hard to find it in the first place, is there actually an easy algorithm for finding the solution that we just haven’t figured out yet?) Most computer scientists think “no,” but despite a mind-boggling number of hours invested by the brightest minds in the world, no one has ever been able to prove it one way or the other. Most practicing computer scientists spend time taking advantage of the known results about mathematical objects and structures, and rarely (if ever) have to construct a watertight proof about them. For the more theoretically-minded student, however, who enjoys probing the basis behind the tools and speculating about additional properties that might exist, devising proofs is an essential skill that can also be very rewarding. #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 254 Context: ``` # CHAPTER 9. PROOF - Cantor, Georg, 7, 12, 17 - capacity (of a byte), 182 - cardinality (of sets), 16, 25, 28, 66 - Carroll, Lewis, 227 - carry-in, 189 - carry-out, 189 - Cartesian product (of sets), - 19, 35 - chess, 114 - child (of a node), 115 - closed interval, 61 - codomain (of a function), 45 - collectively exhaustive, 26 - combinations, 154 - combinators, 25, 141 - commutative, 18, 20, 71 - compilers, 114 - complement laws (of sets), - 21 - complement, partial (of sets), - 18 - complement, total (of sets), - 18, 65, 146, 162 - complete binary tree, 121 - conclusion (of implication), 200 - conditional probability, - 68, 72, 74, 78 - congruent, 173 - conjunction, 199, 208 - connected (vertices/graphs), - 89, 95 - coordinates, 15 - curly brace notation, 11 - cycles, 90 ## DAGs (directed acyclic graphs), 90 - data structures, 85 - Davies family, 8, 9, 26, 147, 154 - De Morgan's laws, 21, 22, 207, 208 - decimal numbers, 165, 169, 173, 178 - degree (of a vertex), 90 - depth (of a node), 115 - dequeuing, 95 - descendant (of a node), 115 - DFT (depth-first traversal), 99, 101 ## Dijkstra's algorithm, 101, 104 - Dijkstra, Edsger, 101 - direct proof, 227 - directed graphs, 88, 91 - disjunction, 199, 208, 226 - disjunctive syllogism, 226 - disk sectors, 156 - distributive, 20, 208, 227 - domain (of a function), 45 - domain of discourse (?), 9, - 19, 21, 27, 60, 210 - domination laws (of sets), 21 - dominors, 234 - drinking age, 232, 234 - duplicates (in sets), 13 - edges, 86, 87 - elements (of sets), 8, 15, 23 - ellipses, 12 - empty graph, 87 - empty set, 9, 16, 21, 24, 25, - 36, 114 - endorelations, 38, 93 ``` #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 255 Context: # 9.4. FINAL WORD enqueuing, 95 enumerating, 141, 149 equality (for sets), 11 equity (logical operator), 202, 203, 293 Euclid, 229 events, 60, 61 exclusive or, 63, 199 existential quantifier (∃), 214 exponential growth, 123, 145 exponential notation, 169 extensional, 10, 11, 37, 47, 92, 203 Facebook, 87, 97 factorial, 147, 152 Federalist Papers, 77 FIFO, 95 filesystems, 113 Fisher, Ronald, 67 floor operator (⌊⌋), 174 Foreman, George, 92 France, 103 Frazier, Joe, 92 free trees, 11, 242 frequentist, 66 full binary trees, 46 function calls, 46 functions, 65, 210 Fundamental Theorem of Computing, 142, 147 Gauss, Carl Friedrich, 236 Goldbach, Christian, 241 golf, 151, 154 graphs, 55, 86 greedy algorithm, 107 Haken, Wolfgang, 244 Hamilton, Alexander, 77 handedness, 79 Harry Potter, 35, 93 heap, 123 height (of a tree), 115 Here Before, 73 heterogeneous, 13, 15 hexadecimal numbers, 171, 173, 176, 180 homogeneous, 14 HTML, 114 identity laws (of sets), 21 image (of a function), 48 imaginary numbers, 17 implies (logical operator), 200, 203, 206 in-order traversal, 119, 131 inclusive or, 63, 199 independence (of events), 78 indirect proof, 229 inductive hypothesis, 234, 240 inductive step, 233, 234, 240 infinite, countably, 17 infinite, uncountably, 17 infinity, 12, 16, 17 injective (function), 48 integers (ℤ), 17, 24 intensional, 10, 11, 37, 47, 93 internal nodes, 115, 239 Internet, 87 intersection (of sets), 19, 199, 208, 227 interval, 61 #################### File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf Page: 256 Context: # CHAPTER 9. PROOF - Jarnik, Vojtech, 107 - Jay, John, 77 - Jumble, 148 - Kentucky Derby, 79 - Knuth, Donald, 153 - Laplace, Pierre-Simon, 67 - Law of the Excluded Middle, 207 - Law of Total Probability, 71, 76 - Least Significant Bit (LSB), 178 - Least Significant Digit (LSD), 169 - Leaves, 115, 128, 239 - Left child, 116 - Level (in a tree), 115 - lg (logarithmic base 2), 124 - LIFO, 100 - Lincoln, Abraham, 86 - Linked lists, 13 - Links (in a graph), 87 - Liston, Sonny, 92 - Locker combinations, 143 - Logarithm, 124 - Logical operators, 198, 199 - Log (in a graph), 87, 94 - Logically-type languages, 14 ## Madison, James, 77 - MapQuest, 87 - Marking (a node), 96, 104 - Mathematical induction, 231 - Medical test, 75 - Member (of set), 9 - Middle school, 153 - Minimal spanning tree, 112 - Modulo operator (mod), 173, 174 - Most Significant Bit (MSB), 178 - Most significant digit, 169 - Movie channel, 153, 156 - Movie theatre, 73 - Mutually exclusive, 26, 76, 80, 144 - n-tuples, 15 - NASCAR, 153 - Negation, 200, 208, 214 - Negative numbers (in binary), 183 - New York City, 176 - Newton, Isaac, 67 - Nibble, 181 - Nodes (of a graph), 87, 242 - Not (logical operator), 200, 203 ## O(log n) algorithm, 128 1. O(n) algorithm, 127, 132 2. Object-oriented design, 114 3. Octal numbers, 171 4. Odometer rollovers, 168 5. One-to-one (functions), 49 6. Onto (function), 49 7. Or (logical operator), 18, 199, 203 - Order (in sets), 13 - Ordered pairs, 14, 19, 35, 45, 212 #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 1 Context: # A First Encounter with Machine Learning **Max Welling** *Donald Bren School of Information and Computer Science* *University of California Irvine* November 4, 2011 #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 3 Context: # Contents Preface iii Learning and Intuition vii 1. Data and Information 1.1 Data Representation ...................................................... 2 1.2 Preprocessing the Data .................................................. 4 2. Data Visualization .................................................................. 7 3. Learning 3.1 In a Nutshell ............................................................... 15 4. Types of Machine Learning 4.1 In a Nutshell ............................................................... 20 5. Nearest Neighbors Classification 5.1 The Idea In a Nutshell .................................................. 23 6. The Naive Bayesian Classifier 6.1 The Naive Bayes Model .................................................. 25 6.2 Learning a Naive Bayes Classifier ............................... 27 6.3 Class-Prediction for New Instances ............................. 28 6.4 Regularization ............................................................... 30 6.5 Remarks .................................................................... 31 6.6 The Idea In a Nutshell .................................................. 32 7. The Perceptron 7.1 The Perceptron Model .................................................. 34 #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 5 Context: # Preface In winter quarter 2007, I taught an undergraduate course in machine learning at UC Irvine. While I had been teaching machine learning at a graduate level, it became soon clear that teaching the same material to an undergraduate class was a whole new challenge. Much of machine learning is built upon concepts from mathematics such as partial derivatives, eigenvalue decompositions, multivariate probability densities, and so on. I quickly found that these concepts could not be taken for granted at an undergraduate level. The situation was aggravated by the lack of a suitable textbook. Excellent textbooks do exist for this field, but I found all of them to be too technical for a first encounter with machine learning. This experience led me to believe there was a genuine need for a simple, intuitive introduction into the concepts of machine learning. A first read to wet the appetite so to speak, a prelude to the more technical and advanced textbooks. Hence, the book you see before you is meant for those starting out in the field who need a simple, intuitive explanation of some of the most useful algorithms that our field has to offer. Machine learning is a relatively recent discipline that emerged from the general field of artificial intelligence only quite recently. To build intelligent machines, researchers realized that these machines should learn from and adapt to their environment. It is simply too costly and impractical to design intelligent systems by first gathering all the expert knowledge ourselves and then hard-wiring it into a machine. For instance, after many years of intense research, we can now recognize faces in images to a high degree of accuracy. But the world has approximately 30,000 visual object categories according to some estimates (Biederman). Should we invest the same effort to build good classifiers for monkeys, chairs, pencils, axes, etc., or should we build systems to can observe millions of training images, some with labels (e.g., in these pixels in the image correspond to a car) but most of them without side information? Although there currently is no system which can recognize even in the order of 1000 object categories (the best system can get). #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 7 Context: # MEANT FOR INDUSTRY AS WELL AS BACKGROUND READING This book was written during my sabbatical at the Radboud University in Nijmegen (Netherlands). I would like to thank Hans for discussion on intuition. I also thank Prof. Bert Kappen, who leads an excellent group of postdocs and students for his hospitality. Marga, kids, UCI,... --- There are a few main aspects I want to cover from a personal perspective. Instead of trying to address every aspect of the entire field, I have chosen to present a few popular and perhaps useful tools and approaches. What will (hopefully) be significantly different than most other scientific books is the manner in which I will present these methods. I have always been frustrated by the lack of proper explanation of equations. Many times, I have been staring at a formula without the slightest clue where it came from or how it was derived. Many books also excel in stating facts in an almost encyclopedic style, without providing the proper intuition of the method. This is my primary mission: to write a book that conveys intuition. The first chapter will be devoted to why I think this is important. #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 9 Context: # Learning and Intuition We have all experienced the situation that the solution to a problem presents itself while riding your bike, walking home, “relaxing” in the washroom, waking up in the morning, taking your shower, etc. Importantly, it did not appear while banging your head against the problem in a conscious effort to solve it, staring at the equations on a piece of paper. In fact, I would claim that all my bits and pieces of progress have occurred while taking a break and “relaxing out of the problem.” Greek philosophers walked in circles when thinking about a problem; most of us stare at a computer screen all day. The purpose of this chapter is to make you more aware of where your creative mind is located and to interact with it in a fruitful manner. My general thesis is that, contrary to popular belief, creative thinking is not performed by conscious thinking. It is rather an interplay between your conscious mind, which prepares the seeds to be planted into the unconscious part of your mind. The unconscious mind will munch on the problem “out of sight” and return promising roads to solutions to the consciousness. This process iterates until the conscious mind decides the problem is sufficiently solved, intractable, or plain dull and moves on to the next. It may be a little unsettling to learn that at least part of your thinking goes on in a part of your mind that seems inaccessible and has a very limited interface with what you think of as yourself. But it is undeniable that it is there, and it is also undeniable that it plays a role in the creative thought process. To become a creative thinker, one should learn to play this game more effectively. To do so, we should think about the language in which to represent knowledge that is most effective in terms of communication with the unconscious. In other words, what type of “interface” between conscious and unconscious mind should we use? It is probably not a good idea to memorize all the details of a complicated equation or problem. Instead, we should extract the abstract idea and capture the essence of it in a picture. This could be a movie with colors and other elements. #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 11 Context: Many people may find this somewhat experimental way to introduce students to new topics counter-productive. Undoubtedly for many it will be. If you feel under-challenged and become bored, I recommend moving on to the more advanced textbooks, of which there are many excellent samples on the market (for a list see [books](#)). But I hope that for most beginning students, this intuitive style of writing may help to gain a deeper understanding of the ideas that I will present in the following. Above all, have fun! #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 14 Context: # CHAPTER 1. DATA AND INFORMATION ## Interpretation Here we seek to answer questions about the data. For instance, what property of this drug was responsible for its high success rate? Does a security officer at the airport apply racial profiling in deciding whose luggage to check? How many natural groups are there in the data? ## Compression Here we are interested in compressing the original data, a.k.a. the number of bits needed to represent it. For instance, files in your computer can be "zipped" to a much smaller size by removing much of the redundancy in those files. Also, JPEG and GIF (among others) are compressed representations of the original pixel map. All of the above objectives depend on the fact that there is structure in the data. If data is completely random, there is nothing to predict, nothing to interpret, and nothing to compress. Hence, all tasks are somehow related to discovering or leveraging this structure. One could say that data is highly redundant and that this redundancy is exactly what makes it interesting. Take the example of natural images. If you are required to predict the color of the pixels neighboring some random pixel in an image, you would be able to do a pretty good job (for instance, 20% may be blue sky and predicting the neighbors of a blue sky pixel is easy). Also, if we would generate images at random, they would not look like natural scenes at all. For one, it wouldn’t contain objects. Only a tiny fraction of all possible images looks "natural," and so the space of natural images is highly structured. Thus, all of these concepts are intimately related: structure, redundancy, predictability, regularity, interpretability, compressibility. They refer to the "food" for machine learning. Without structure, there is nothing to learn. The same thing is true for human learning. From the day we are born, we start noticing that there is structure in this world. Our survival depends on discovering and recording this structure. If I walk into this brown cylinder with a green canopy, I suddenly stop; it won’t give way. In fact, it damages my body. Perhaps this holds for all these objects. When I cry, my mother suddenly appears. Our game is to predict the future accurately, and we predict it by learning its structure. ## 1.1 Data Representation What does "data" look like? In other words, what do we download into our computer? Data comes in many shapes and forms; for instance, it could be words from a document or pixels from an image. But it will be useful to convert data into a #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 16 Context: # CHAPTER 1. DATA AND INFORMATION ## 1.2 Preprocessing the Data As mentioned in the previous section, algorithms are based on assumptions and can become more effective if we transform the data first. Consider the following example, depicted in Figure 1. The algorithm we consist of estimating the area that the data occupy. It grows a circle starting at the origin and at the point it contains all the data we record the area of a circle. In the figure why this will be a bad estimate: the data-cloud is not centered. If we had first centered it we would have obtained reasonable estimate. Although this example is somewhat simple-minded, there are many, much more interesting algorithms that assume centered data. To center data we will introduce the sample mean of the data, given by, \[ E[X_i] = \frac{1}{N} \sum_{n=1}^{N} X_{in} \tag{1.1} \] Hence, for every attribute \(i\) separately, we simple add all the attribute value across data-cases and divide by the total number of data-cases. To transform the data so that their sample mean is zero, we set, \[ X'_{in} = X_{in} - E[X_i], \quad \forall n \tag{1.2} \] It is now easy to check that the sample mean of \(X'\) indeed vanishes. An illustration of the global shift is given in Figure 2. We also see in this figure that the algorithm described above now works much better! In a similar spirit as centering, we may also wish to scale the data along the coordinate axis in order to make it more “spherical.” Consider Figure 3. In this case the data was first centered, but the elongated shape still prevented us from using the simplistic algorithm to estimate the area covered by the data. The solution is to scale the axes so that the spread is the same in every dimension. To define this operation we first introduce the notion of sample variance, \[ V[X_i] = \frac{1}{N} \sum_{n=1}^{N} X_{in}^2 \tag{1.3} \] where we have assumed that the data was first centered. Note that this is similar to the sample mean, but now we have used the square. It is important that we have removed the sign of the data-cases (by taking the square) because otherwise positive and negative signs might cancel each other out. By first taking the square, all data-cases first get mapped to positive half of the axes (for each dimension or #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 20 Context: # CHAPTER 2. DATA VISUALIZATION An example of such a scatter plot is given in Figure ??. Note that we have a total of \( d(d - 1) / 2 \) possible two-dimensional projections which amounts to 4950 projections for 100-dimensional data. This is usually too many to manually inspect. How do we cut down on the number of dimensions? Perhaps random projections may work? Unfortunately, that turns out to be not a great idea in many cases. The reason is that data projected on a random subspace often looks distributed according to what is known as a Gaussian distribution (see Figure ??). The deeper reason behind this phenomenon is the **central limit theorem** which states that the sum of a large number of independent random variables is (under certain conditions) distributed as a Gaussian distribution. Hence, if we denote with \( \mathbf{w} \) a vector in \( \mathbb{R}^d \) and by \( x \) the d-dimensional random variable, then \( y = \mathbf{w}^T \mathbf{x} \) is the value of the projection. This is clearly a weighted sum of the random variables \( x_i, \; i = 1, \ldots, d \). If we assume that \( x_i \) are approximately independent, then we can see that their sum will be governed by its central limit theorem. Analogously, a dataset \( \{X_n\} \) can thus be visualized in one dimension by "histogramming" the values of \( y = \mathbf{w}^T \mathbf{x} \), see Figure ??. In this figure we clearly recognize the characteristic "Bell-shape" of the Gaussian distribution of projected and histogrammed data. In one sense the central limit theorem is a rather helpful quirk of nature. Many variables follow Gaussian distributions and the Gaussian distribution is one of the few distributions which have very nice analytical properties. Unfortunately, the Gaussian distribution is also the most uninformative distribution. This notion of "uninformative" can actually be made very precise using information theory and states: > Given a fixed mean and variance, the Gaussian density represents the least amount of information among all densities with the same mean and variance. This is rather unfortunate for our purposes because Gaussian projections are the least revealing dimensions to look at. So in general we have to work a bit harder to see interesting structure. A large number of algorithms has been devised to search for informative projections. The simplest being "principal component analysis" or PCA for short ??. Here, interesting means dimensions of high variance. However, the fact that high variance is not always a good measure of interestingness and one should rather search for dimensions that are non-Gaussian. For instance, "independent components analysis" (ICA) ?? and "projection pursuit" ?? search for dimen- #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 24 Context: # CHAPTER 3. LEARNING They understood that this was a lion. They understood that all lions have these particular characteristics in common, but may differ in some other ones (like the presence of a scar or scar someplace). Bob has another disease which is called over-generalization. Once he has seen an object he believes almost everything is some, perhaps twisted instance of the same object class (In fact, I seem to suffer from this so now and then when I think all of machine learning can be explained by this one new exciting principle). If ancestral Bob walks the savanna and he has just encountered an instance of a lion and fled into a tree with his buddies, the next time he sees a squirrel he believes it is a small instance of a dangerous lion and flees into the trees again. Over-generalization seems to be rather common among small children. One of the main conclusions from this discussion is that we should neither over-generalize nor over-fit. We need to be on the edge of being just right. But just right about what? It doesn’t seem there is one correct God-given definition of the category chairs. We seem to all agree, but one can surely find examples that would be difficult to classify. When do we generalize exactly right? The magic word is **PREDICTION**. From an evolutionary standpoint, all we have to do is make correct predictions about aspects of life that help us survive. Nobody really cares about the definition of lion, but we do care about our responses to the various animals (run away for lion, chase for deer). And there are a lot of things that can be predicted in the world. This food kills me but food is good for me. Drumming my fists on my hairy chest in front of a female generates opportunities for sex; sticking my hand into that yellow-orange flickering “flame” hurts my hand and so on. The world is wonderfully predictable and we are very good at predicting it. So why do we care about object categories in the first place? Well, apparently they help us organize the world and make accurate predictions. The category *lions* is an abstraction and abstractions help us to generalize. In a certain sense, learning is all about finding useful abstractions or concepts that describe the world. Take the concept “fluid”; it describes all watery substances and summarizes some of their physical properties. Or the concept of “weight”: an abstraction that describes a certain property of objects. Here is one very important corollary for you: > “machine learning is not in the business of remembering and regurgitating observed information, it is in the business of transferring (generalizing) properties from observed data onto new, yet unobserved data.” This is the mantra of machine learning that you should repeat to yourself every night before you go to bed (at least until the final exam). The information we receive from the world has two components to it: there #################### File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf Page: 38 Context: # CHAPTER 6. THE NAIVE BAYESIAN CLASSIFIER An example of the traffic that it generates: the University of California Irvine receives on the order of 2 million spam emails a day. Fortunately, the bulk of these emails (approximately 97%) is filtered out or dumped into your spam box and will reach your attention. How is this done? Well, it turns out to be a classic example of a classification problem: spam or ham, that’s the question. Let’s say that spam will receive a label 1 and ham a label 0. Our task is thus to label each new email with either 0 or 1. What are the attributes? Rephrasing this question, what would you measure in an email to see if it is spam? Certainly, if I read “viagra” in the subject I would stop right there and dump it in the spam box. What else? Here are a few: “enlargement,” “cheap,” “buy,” “pharmacy,” “money,” “loan,” “mortgage,” “credit,” and so on. We can build a dictionary of words that we can detect in each email. This dictionary could also include word phrases such as “buy now,” “penis enlargement,” one can make phrases as sophisticated as necessary. One could measure whether the words or phrases appear at least once or one could count the actual number of times they appear. Spammers know about the way these spam filters work and counteract by slight misspellings of certain key words. Hence we might also want to detect words like “via gra” and so on. In fact, a small arms race has ensued where spam filters and spam generators find tricks to counteract the tricks of the “opponent.” Putting all these subtleties aside for a moment we’ll simply assume that we measure a number of these attributes for every email in a dataset. We’ll also assume that we have spam/ham labels for these emails, which were acquired by someone removing spam emails by hand from his/her inbox. Our task is then to train a predictor for spam/ham labels for future emails where we have access to attributes but not to labels. The NB model is what we call a "generative" model. This means that we imagine how the data was generated in an abstract sense. For emails, this works as follows, an imaginary entity first decides how many spam and ham emails it will generate on a daily basis. Say, it decides to generate 40% spam and 60% ham. We will assume this doesn’t change with time (of course it doesn’t, but we will make this simplifying assumption for now). It will then decide what the chance is that a certain word appears \( x \) times in a spam email. For example, the word “viagra” has a chance of 96% to not appear at all, 1% to appear once, 0.9% to appear twice, etc. These probabilities are clearly different for spam and ham. “Viagra” should have a much smaller probability to appear in a ham email (but of course; consider I send this text to my publisher by email). Given these probabilities, we can then go on and try to generate emails that actually look like real emails, i.e., with proper sentences, but we won’t need that in the following. Instead we make the simplifying assumption that email consists of “a bag of words,” in random order. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 6 Context: # A Brief Introduction to Machine Learning for Engineers Osvaldo Simeone¹ ¹Department of Informatics, King’s College London; osvaldo.simeone@kcl.ac.uk ## ABSTRACT This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. The material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with an engineering background in probability and linear algebra. --- **ISSN:** DOI XXXXXXXXXX ©2018 XXXXXXXXXX #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 8 Context: 2. Notation ============== - The notation log represents the logarithm in base two, while ln represents the natural logarithm. - \( \mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \) indicates that random vector \(\mathbf{x}\) is distributed according to a multivariate Gaussian distribution with mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\Sigma\). The multivariate Gaussian pdf is denoted as \( \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \Sigma) \) as a function of \(\mathbf{x}\). - \( \mathbf{x} \sim U[a,b] \) indicates that \(\mathbf{x}\) is distributed according to a uniform distribution in the interval [a, b]. The corresponding uniform pdf is denoted as \( U(a,b) \). - \( \delta(x) \) denotes the Dirac delta function or the Kronecker delta function, as clear from the context. - \( \|\mathbf{a}\|_2^2 = \sum_{i=1}^N a_i^2 \) is the quadratic, or \( l_2 \) norm of a vector \(\mathbf{a} = [a_1, \ldots, a_N]^T\). We similarly define the \( l_1 \) norm as \( \|\mathbf{a}\|_1 = \sum_{i=1}^N |a_i| \), and the \( l_0 \) pseudo-norm \( \|\mathbf{a}\|_0 \) as the number of non-zero entries of vector \(\mathbf{a}\). - \( I \) denotes the identity matrix, whose dimensions will be clear from the context. Similarly, I represents a vector of all ones. - \( \mathbb{R} \) is the set of real numbers; \( \mathbb{R}^+ \) the set of non-negative real numbers; \( \mathbb{R}^- \) the set of non-positive real numbers; and \( \mathbb{R}^N \) is the set of all vectors of \( N \) real numbers. - \( |S| \) represents the cardinality of a set \( S \). - \( \mathbf{Z} \) represents a set of random variables indexed by the integers \( k \in S \). #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 11 Context: # Part I ## Basics #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 17 Context: # 1.3 Goals and Outline This monograph considers only passive and offline learning. ## 1.3 Goals and Outline This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. Unlike other texts that are focused on one particular aspect of the field, an effort has been made here to provide a broad but concise overview in which the main ideas and techniques are systematically presented. Specifically, the material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. A prior exposure to information theory is useful but not required. Detailed discussions are provided on basic concepts and ideas, including overfitting and generalization, Maximum Likelihood and regularization, and Bayesian inference. The text also endeavors to provide intuitive explanations and pointers to advanced topics and research directions. Sections and subsections containing more advanced material that may be regarded at a first reading are marked with a star (+). The reader will find here further discussions on computing platform or programming frameworks, such as map-reduce, nor details on specific applications involving large data sets. These can be easily found in a vast and growing body of work. Furthermore, rather than providing exhaustive details on the existing myriad solutions in each specific category, techniques have been selected that are useful to illustrate the most salient aspects. Historical notes have also been provided only for a few selected milestone events. Finally, the monograph attempts to strike a balance between the algorithmic and theoretical viewpoints. In particular, all learning algorithms are presented in a manner that emphasizes their theoretical foundations while also providing practical insights. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 20 Context: # Introduction from *The Computer and the Brain*, 1958. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 50 Context: ``` 44 # A Gentle Introduction through Linear Regression Produces a description of approximately $$ - \sum_{n=1}^{N} \log{p(y_n | x_n, w_{ML}, \beta_{ML})} $$ bits. This description is, however, not sufficient, since the decoder of the description should also be informed about the parameters $(w_{ML}, \beta_{ML})$. Using quantization, the parameters can be described by means of a number $C(M)$ of bits that is proportional to the number of parameters, here $M + 2$. Concatenating these bits with the description produced by the ML model yields the overall description length $$ - \sum_{n=1}^{N} \log{p[y_n | x_n, w_{ML}, \beta_{ML}]} + C(M). \quad (2.39) $$ MDL – in the simplified form discussed here – selects the model order $M$ that minimizes the description length (2.39). Accordingly, the term $C(M)$ acts as a regularizer. The optimal value of $M$ for the MDL criterion is hence the result of the trade-off between the minimization of the overhead $C(M)$, which calls for a small value of $M$, and the minimization of the NLL, which decreases with $M$. Under some technical assumptions, the overhead term can be often evaluated in the form $$ \left( \frac{K}{2} \right) N \log N + c, $$ where $K$ is the number of parameters in the model and $c$ is a constant. This expression is not quite useful in practice, but it provides intuition about the mechanism used by MDL to combat overfitting. ## 2.6 Information-Theoretic Metrics We now provide a brief introduction to information-theoretic metrics by leveraging the example studied in this chapter. As we will see in the following chapters, information-theoretic metrics are used extensively in the definition of learning algorithms. Appendix A provides a detailed introduction to information-theoretic measures in terms of inferential tasks. Here we introduce the key metrics of Kullback-Leibler (KL) divergence and entropy by examining the asymptotic behavior of ML in the regime of large $N$. The case with finite $N$ is covered in Chapter 6 (see Sec. 6.4.3). ``` #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 58 Context: # 3.1 Preliminaries We start with a brief review of some definitions that will be used throughout the chapter and elsewhere in the monograph (see [28] for more details). Readers with a background in convex analysis and calculus may just review the concept of sufficient statistic in the last paragraph. First, we define a **convex set** as a subset of \(\mathbb{R}^D\), for some \(D\), that contains all segments between any two points in the set. Geometrically, convex sets hence cannot have “indentations.” Function \(f(x)\) is convex if its domain is a convex set and if it satisfies the inequality \(f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda)f(x_2)\) for all \(x_1\) and \(x_2\) in its domain and for all \(0 \leq \lambda \leq 1\). Geometrically, this condition says that the function is “U”-shaped: the curve defining the function cannot be above the segment obtained by connecting any two points on the curve. A function is strictly convex if the inequality above is strict except for \(\lambda = 0\) or \(\lambda = 1\); a concave, or strictly concave, function is defined by reversing the inequality above – it is hence “n-shaped.” The minimization of a convex (“U”) function over a convex constraint set or the maximization of a concave (“n”) function over a convex constraint set are known as convex optimization problems. For these problems, there exist powerful analytical and algorithmic tools to obtain globally optimal solutions [28]. We also introduce two useful concepts from calculus. The **gradient** of a differentiable function \(f(x)\) with \(x = [x_1, \ldots, x_D]^T \in \mathbb{R}^D\) is defined as the \(D \times 1\) vector \(\nabla f(x) = [\frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_D}]^T\) containing all partial derivatives. At any point \(x\) in the domain of the function, the gradient is a vector that points to the direction of locally maximal increase of the function. The Hessian \(\nabla^2 f(x)\) is the \(D \times D\) matrix with \((i,j)\) element given by the second-order derivative \(\frac{\partial^2 f(x)}{\partial x_i \partial x_j}\). It captures the local curvature of the function. 1. A statistic is a function of the data. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 119 Context: # 5 ## Statistical Learning Theory* Statistical learning theory provides a well-established theoretical framework in which to study the trade-off between the number \( N \) of available data points and the generalization performance of a trained machine. The approach formalizes the notions of model capacity, estimation error (or generalization gap), and bias that underlie many of the design choices required by supervised learning, as we have seen in the previous chapters. This chapter is of mathematical nature, and it departs from the algorithmic focus of the text so far. While it may be skipped at a first reading, the chapter sheds light on the key empirical observations made in the previous chapters relative to learning in a frequentist setup. It does so by covering the theoretical underpinnings of supervised learning within the classical framework of statistical learning theory. To this end, the chapter contains a number of formal statements with proofs. The proofs have been carefully selected in order to highlight and clarify the key theoretical ideas. This chapter follows mostly the treatment in [133]. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 172 Context: # 7 ## Probabilistic Graphical Models As we have seen in the previous chapters, probabilistic models are widely used in machine learning. Using Fig. 6.1 as an example, we have encountered both directed and undirected models, which have been used to carry out supervised and unsupervised learning tasks. Graphical models encode structural information about the random variables of interest, both observed and latent. They hence provide a principled way to define parametric probabilistic models with desired features. The selection of a probabilistic graphical model hence follows the same general rules that have been discussed so far: A more specialized, or structured, model may help reduce overfitting, and hence the generalization gap. This is done, as we will see, by reducing the number of parameters to be learned. On the flip side, specialization may come at the cost of an irrecoverable bias. In this chapter, we provide an introduction to the vast field of probabilistic graphical models, which is a powerful framework that allows us to represent and learn structured probabilistic models. The goal here is to introduce the main concepts and tools, while referring to the extensive treatments in [81, 15, 104, 151] for additional information. #################### File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf Page: 213 Context: # Appendix A: Information Measures In this appendix, we describe a principled and intuitive introduction to information measures that builds on inference, namely estimation and hypothesis testing. We focus on entropy, mutual information, and divergence measures. We also concentrate on discrete r.v.s. In the monograph, we have taken the pragmatic approach of extending the definitions to continuous variables by substituting sums with integrals. It is worth noting that this approach does not come with any practical complications when dealing with mutual information and divergence. Instead, the continuous version of the entropy, known as differential entropy, should be treated with care, as it does not satisfy some key properties of the entropy such as non-negativity. ## A.1 Entropy As proposed by Claude Shannon, the amount of information received from the observation of a discrete random variable \( X \sim p(x) \) defined over a finite alphabet should be measured by the amount of uncertainty about its value prior to its measurement. To this end, we consider the problem of estimating the value of \( x \) when one only knows the following: ########## """QUERY: why is the book titled a cool brisk walk through discrete math?""" Consider the chat history for relevant information. Use all information included. Use as much tokens as needed. Important: If you find information separated by a | in the context, it is a table formatted in Markdown.Important: Take a look at the QUERY and only the QUERY. If this is vague or unclear, please ignore everything and ask a follow-up question instead! Final Files Sources: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 1, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 2, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 3, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 5, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 6, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 7, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 9, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 10, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 11, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 12, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 13, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 15, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 18, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 21, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 23, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 24, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 25, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 30, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 33, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 36, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 39, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 40, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 43, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 52, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 55, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 60, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 64, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 67, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 68, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 73, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 75, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 88, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 91, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 95, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 98, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 101, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 103, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 104, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 109, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 112, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 113, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 115, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 118, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 119, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 128, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 135, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 142, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 145, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 149, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 150, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 151, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 156, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 158, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 160, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 161, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 163, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 165, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 168, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 171, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 173, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 175, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 200, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 205, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 219, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 224, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 226, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 228, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 231, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 234, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 236, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 237, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 238, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 240, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 244, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 251, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 252, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 254, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 255, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 256, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 1, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 3, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 5, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 7, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 9, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 11, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 14, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 16, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 20, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 24, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 38, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 6, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 8, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 11, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 17, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 20, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 50, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 58, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 119, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 172, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 213 ================================================== **Elapsed Time: 2.48 seconds** ================================================== FINAL ANSWER Answer: The book "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies is titled as such to reflect the author's intention of providing a quick, engaging, and accessible introduction to the subject of discrete mathematics. Here are the key reasons based on the provided context: 1. **Approachability and Engagement**: The author aims to make the subject approachable and engaging, akin to taking a refreshing walk through a forest. This is evident from the preface where he mentions that the book is designed to be a "quick guided tour" of introductory-level discrete mathematics, pointing out notable features and instilling a sense of appreciation and awe for the subject. 2. **Contrast with Traditional Textbooks**: Davies contrasts his book with traditional discrete mathematics textbooks, which he describes as "thick, dense volumes packed with lots of equations and proofs." His book, on the other hand, is meant to be less intimidating and more inviting, much like a brisk walk that is enjoyable and not overly strenuous. 3. **Target Audience**: The book is tailored for computer science students and practitioners who need to use discrete mathematics tools rather than derive them. The emphasis is on gaining familiarity and practical experience with fundamental concepts, rather than becoming expert theorem provers. This practical, user-friendly approach is akin to a brisk walk that covers essential ground quickly and efficiently. 4. **Metaphorical Imagery**: The imagery of a "cool brisk walk" through a "pretty forest" is used to convey the idea of exploring the landscape of discrete mathematics in a way that is both refreshing and enlightening. The cover art, depicting a serene forest scene, reinforces this metaphor, suggesting that the journey through the book will be pleasant and invigorating. 5. **Educational Philosophy**: Davies believes that learning should be enjoyable and that students benefit from an appreciation of the richness and rigor of the material. The title reflects his educational philosophy of making learning a delightful and stimulating experience, much like a brisk walk that leaves one feeling energized and inspired. In summary, the title "A Cool Brisk Walk Through Discrete Mathematics" encapsulates the author's goal of providing a concise, engaging, and accessible introduction to discrete mathematics, making the learning process enjoyable and less daunting for students and practitioners. ================================================== **Elapsed Time: 5.16 seconds** ==================================================