INITIALIZATION
Knowledgebase: ki-dev-large
Base Query: can you tell me what is discrete math in 500 words?
Model: gpt-4o
Use Curl?: None
==================================================
**Elapsed Time: 0.00 seconds**
==================================================


ROUTING
Query type: summary
==================================================
**Elapsed Time: 2.06 seconds**
==================================================


RAG PARAMETERS
Max Context To Include: 100
Lowest Score to Consider: 0.1
==================================================
**Elapsed Time: 0.14 seconds**
==================================================


VECTOR SEARCH ALGORITHM TO USE 
Use MMR search?: True
Use Similarity search?: False
==================================================
**Elapsed Time: 0.10 seconds**
==================================================


VECTOR SEARCH DONE 
==================================================
**Elapsed Time: 6.36 seconds**
==================================================


PRIMER 
Primer: 
            You are Simon, a highly intelligent personal assistant in a system called KIOS. You are a chatbot that
            can read knowledgebases through the "CONTEXT" that is included in the user's chat message.

        
            Your role is to act as an expert at summarization and analysis.

            In your responses to enterprise users, prioritize clarity, trustworthiness, and appropriate formality.

            Be honest by admitting when a topic falls outside your scope of knowledge, and suggest
            alternative avenues for obtaining information when necessary.

            Make effective use of chat history to avoid redundancy and enhance response relevance, continuously
            adapting to integrate all necessary details in your interactions.

            Use as much tokens as possible to provide a detailed response.
        
==================================================
**Elapsed Time: 0.20 seconds**
==================================================


FINAL QUERY 
Final Query: CONTEXT: ##########
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 1

Context: # A Cool Brisk Walk  
Through Discrete Mathematics

**Version 2.2.1**  
**Stephen Davies, Ph.D.**  

## Table of Contents
1. Introduction
2. Concepts
   - Sets
   - Functions
   - Relations
3. Applications
4. Conclusion

## Introduction
In this document, we explore various topics under discrete mathematics. 

## Concepts

### Sets
- Definition
- Notation
- Operations

### Functions
- Definition
- Types of functions
- Examples

### Relations
- Definition
- Properties

| Concept   | Description                       |
|-----------|-----------------------------------|
| Set      | A collection of distinct objects   |
| Function  | A relation from a set to itself    |
| Relation  | A connection between two sets       |

## Applications
The concepts of discrete mathematics can be applied in various fields, including computer science, information technology, and combinatorial design.

## Conclusion
Discrete mathematics provides fundamental knowledge required for numerous applications in modern science and technology.

Image Analysis: 

### Analysis of the Attached Visual Content

#### 1. Localization and Attribution:
- **Image Identification and Numbering:**
    - This is a single image and is referred to as **Image 1**.

#### 2. Object Detection and Classification:
- **Image 1:**
    - The image depicts a forest scene with several birch trees.
    - **Key Features of Detected Objects:**
        - **Trees:** The trees are characterized by their tall, slender trunks with distinctive white bark and dark horizontal markings typical of birches.
        - **Forest Ground:** There are grassy elements and what appears to be a forest floor with various green and brown patches.

#### 3. Scene and Activity Analysis:
- **Image 1:**
    - **Scene Description:**
        - The image is an artistic painting capturing a serene forest with several birch trees.
        - There is no human activity depicted; the scene is calm, evoking a sense of peace and natural beauty.

#### 4. Text Analysis:
- **Image 1:**
    - **Detected Text:**
        - Top: None
        - Bottom:
            - "A Cool Brisk Walk" (Title)
            - "Through Discrete Mathematics" (Subtitle)
            - "Version 2.2.1" (Version Information)
            - "Stephen Davies, Ph.D." (Author Name)
    - **Text Significance:**
        - The text suggests that this image is probably the cover of a book titled "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies, Ph.D. The version number indicates it is version 2.2.1, signifying a possibly updated or revised edition.

#### 8. Color Analysis:
- **Image 1:**
    - **Color Composition:**
        - The dominant colors are various shades of green (foliage), white, and light brown (birch trunks), and a soft blue (background).
    - **Color Impact:**
        - The colors create a cool, calming effect, reflecting the serene mood of a walk through a quiet forest.

#### 9. Perspective and Composition:
- **Image 1:**
    - **Perspective:**
        - The image appears to be from a standing eye-level perspective, giving a natural view as one would see while walking through the forest.
    - **Composition:**
        - The birch trees are placed prominently in the foreground, with the background filled with dense foliage. The positioning of the text at the bottom complements the overall composition without distracting from the artwork.

#### 10. Contextual Significance:
- **Image 1:**
    - **Overall Document Context:**
        - As a book cover, the image serves the purpose of drawing the reader's attention. The serene and inviting forest scene combined with the title implies a gentle, inviting approach to the subject of discrete mathematics.

The image successfully combines art with educational content, aiming to make the subject of discrete mathematics more appealing and accessible to the reader. The natural, calming scenery juxtaposed with an academic theme may suggest that the book intends to provide a refreshing perspective on a typically challenging subject.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 2

Context: Copyright © 2023 Stephen Davies.

## University of Mary Washington
**Department of Computer Science**  
James Farmer Hall  
1301 College Avenue  
Fredericksburg, VA 22401

---

Permission is granted to copy, distribute, transmit and adapt this work under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).

The accompanying materials at [www.allthemath.org](http://www.allthemath.org) are also under this license.

If you are interested in distributing a commercial version of this work, please contact the author at [stephen@umw.edu](mailto:stephen@umw.edu).

The LaTeX source for this book is available from: [https://github.com/divilian/cool-brisk-walk](https://github.com/divilian/cool-brisk-walk).

---

Cover art copyright © 2014 Elizabeth M. Davies.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 3

Context: # Contents at a glance

## Contents at a glance
- **Contents at a glance** …… i
- **Preface** …… iii
- **Acknowledgements** …… v
  
1. **Meetup at the trailhead** …… 1
2. **Sets** …… 7
3. **Relations** …… 35
4. **Probability** …… 59
5. **Structures** …… 85
6. **Counting** …… 141
7. **Numbers** …… 165
8. **Logic** …… 197
9. **Proof** …… 223

Also be sure to check out the forever-free-and-open-source instructional videos that accompany this series at [www.allthemath.org](http://www.allthemath.org)!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 5

Context: # Preface

Discrete math is a popular book topic — start Googling around and you’ll find a zillion different textbooks about it. Take a closer look, and you’ll discover that most of these are pretty thick, dense volumes packed with lots of equations and proofs. They’re principled approaches, written by mathematicians and (seemingly) to mathematicians. I speak with complete frankness when I say I’m comforted to know that the human race is well covered in this area. We need smart people who can derive complex expressions and prove theorems from scratch, and I’m glad we have them.

Your average computer science practitioner, however, might be better served by a different approach. There are elements to the discrete math mindset that a budding software developer needs experience with. This is why discrete math is (properly, I believe) part of the mandatory curriculum for most computer science undergraduate programs. But for future programmers and engineers, the emphasis should be different than it is for mathematicians and researchers in computing theory. A practical computer scientist mostly needs to be able to use these tools, not to derive them. She needs familiarity, and practice, with the fundamental concepts and the thought processes they involve. The number of times the average software developer will need to construct a proof in graph theory is probably near zero. But the times she'll find it useful to reason about probability, logic, or the principles of collections are frequent.

I believe the majority of computer science students benefit most from simply gaining an appreciation for the richness and rigor of
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 6

Context: # PREFACE

This material, what it means, and how it impacts their discipline. Becoming an expert theorem prover is not required, nor is deriving closed-form expressions for the sizes of trees with esoteric properties. Basic fluency with each topic area, and an intuition about when it can be applied, is the proper aim for most of those who would go forward and build tomorrow's technology.

To this end, the book in your hands is a quick guided tour of introductory-level discrete mathematics. It's like a cool, brisk walk through a pretty forest. I point out the notable features of the landscape and try to instill a sense of appreciation and even of awe. I want the reader to get a feel for the lay of the land and a little exercise. If the student acquires the requisite vocabulary, gets some practice playing with the toys, and learns to start thinking in terms of the concepts here described, I will count it as a success.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 9

Context: # Chapter 1

## Meetup at the trailhead

Before we set out on our “cool, brisk walk,” let’s get oriented. What is discrete mathematics, anyway? Why is it called that? What does it encompass? And what is it good for?

Let’s take the two words of the subject, in reverse order. First, **math**. When most people hear “math,” they think “numbers.” After all, isn’t math the study of quantity? And isn’t that the class where we first learned to count, add, and multiply?

Mathematics certainly has its root in the study of numbers — specifically, the “natural numbers” (the integers from 1 on up) that fascinated the ancient Greeks. Yet math is broader than this, almost to the point where numbers can be considered a special case of something deeper. In this book, when we talk about trees, sets, or formal logic, there might not be a number in sight.

Math is about **abstract, conceptual objects that have properties,** and the implications of those properties. An “object” can be any kind of “thought material” that we can define and reason about precisely. Much of math deals with questions like, “Suppose we defined a certain kind of thing that had certain attributes. What would be the implications of this, if we reasoned it all the way out?” The “thing” may or may not be numerical, whatever it turns out to be. Like a number, however, it will be crisply defined, have certain known aspects to it, and be capable of combining with other things in some way.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 10

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Fundamental to math is that it deals with the **abstract**. Abstract, which is the opposite of concrete, essentially means something that can’t be perceived with the senses. A computer chip is concrete: you can touch it, you can see it. A number is not; nor is a function, a binary tree, or a logical implication. The only way to perceive these things is with the power of the mind. We will write expressions and draw pictures of many of our mathematical structures in order to help visualize them, and nearly everything we study will have practical applications whereby the abstractness gets grounded in concreteness for some useful purpose. But the underlying mathematical entity remains abstract and ethereal — only accessible to the mind’s eye. We may use a pencil to form the figure “5” on a piece of paper, but that is only a concrete manifestation of the underlying concept of “five-ness.” Don’t mistake the picture or the symbol for the thing itself, which always transcends any mere physical representation.

The other word in the name of our subject is **“discrete”** (not to be confused with “discreet,” which means something else entirely). The best way to appreciate what discrete means is to contrast it with its opposite, **continuous**. Consider the following list:

| Discrete                      | Continuous             |
|-------------------------------|------------------------|
| whole numbers (\( \mathbb{Z} \))  | real numbers (\( \mathbb{R} \)) |
| int                           | double                 |
| digital                       | analog                 |
| quantum                       | continuum              |
| counting                      | measuring              |
| number theory                 | analysis               |
| \( \Sigma \)                  | \( \int \frac{d}{dx} \) |

What do the left-hand entries have in common? They describe things that are measured in crisp, distinct intervals, rather than varying smoothly over a range. Discrete things jump suddenly from position to position, with rigid precision. If you’re 5 feet tall, you might some day grow to 5.3 feet; but though there might be 5
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 11

Context: # Understanding Integration and Differentiation

People in your family, there will never be 5.3 of them (although there could be 6 someday).

The last couple of entries on this list are worth a brief comment. They are math symbols, some of which you may be familiar with.

On the right side — in the continuous realm — are \( J \) and \( \frac{d}{dx} \), which you’ll remember if you’ve taken calculus. They stand for the two fundamental operations of integration and differentiation. Integration, which can be thought of as finding “the area under a curve,” is basically a way of adding up a whole infinite bunch of numbers over some range. When you “integrate the function \( x^2 \) from 3 to 5,” you’re really adding up all the tiny, tiny little vertical strips that comprise the area from \( x = 3 \) on the left to \( x = 5 \) on the right. Its corresponding entry in the left-column of the table is \( \Sigma \), which is just a short-hand for “sum up a bunch of things.” Integration and summation are equivalent operations; it’s just that when you integrate, you’re adding up all the (infinitely many) slivers across the real-line continuum. When you sum, you’re adding up a fixed sequence of entries, one at a time, like in a loop. \( \Sigma \) is just the discrete “version” of \( J \).

The same sort of relationship holds between ordinary subtraction (\(-\)) and differentiation (\(\frac{d}{dx}\)). If you’ve plotted a bunch of discrete points on \( x \)-\( y \) axes, and you want to find the slope between two of them, you just subtract their \( y \) values and divide by the \( x \) distance between them. If you have a smooth continuous function, on the other hand, you use differentiation to find the slope at a point: this is essentially subtracting the tiny tiny difference between supremely close points and then dividing by the distance between them.

Don’t worry; you don’t need to have fully understood any of the integration or differentiation stuff I just talked about, or even to have taken calculus yet. I’m just trying to give you some feel for what “discrete” means, and how the dichotomy between discrete and continuous really runs through all of math and computer science. In this book, we will mostly be focusing on discrete values and structures, which turn out to be of more use in computer science. That’s partially because, as you probably know, computers
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 12

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Themselves are discrete, and can only store and compute discrete values. There can be many of them — megabytes, gigabytes, terabytes — but each value stored is fundamentally comprised of bits, each of which has a value of either 0 or 1. This is unlike the human brain, by the way, whose neuronal synapses communicate based on the continuous quantities of chemicals present in their axons. So I guess “computer” and “brain” are another pair of entities we could add to our discrete vs. continuous list.

There’s another reason, though, why discrete math is of more use to computer scientists than continuous math is, beyond just the bits-and-bytes thing. Simply put, computers operate algorithmically. They carry out programs in step-by-step, iterative fashion. First do this, then do that, then move on to something else. This mechanical execution, like the ticking of a clock, permeates everything the computer can do, and everything we can tell it to do. At a given moment in time, the computer has completed step 7, but not step 8; it has accumulated 38 values, but not yet 39; its database has exactly 15 entries in it, no more and no less; it knows that after accepting this friend request, there will be exactly 553 people in your set of friends. The whole paradigm behind reasoning about computers and their programs is discrete, and that’s why we computer scientists find different problems worth thinking about than most of the world did a hundred years ago.

But it’s still math. It’s just discrete math. There’s a lot to come, so limber up and let me know when you’re ready to hit the road.

## 1.1 Exercises

Use an index card or a piece of paper folded lengthwise, and cover up the right-hand column of the exercises below. Read each exercise in the left-hand column, answer it in your mind, then slide the index card down to reveal the answer and see if you’re right! For every exercise you missed, figure out why you missed it before moving on.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 13

Context: # 1.1 EXERCISES

1. **What’s the opposite of concrete?**  
   Abstract.

2. **What’s the opposite of discrete?**  
   Continuous.

3. **Consider a quantity of water in a glass. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Concrete, since it’s a real entity you can experience with the senses. Continuous, since it could be any number of ounces (or liters, or tablespoons, or whatever). The amount of water certainly doesn’t have to be an integer. (Food for thought: since all matter is ultimately comprised of atoms, are even substances like water discrete?)

4. **Consider the number 27. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Abstract, since you can’t see or touch or smell “twenty-seven.” Probably discrete, since it’s an integer, and when we think of whole numbers we think “discrete.” (Food for thought: in real life, how would you know whether I meant the integer “27” or the decimal number “27.00”? And does it matter?) Clearly it’s discrete. Abstract vs. concrete, though, is a little tricky. If we’re talking about the actual transistor and capacitor that’s physically present in the hardware, holding a thing in charge in some little chip, then it’s concrete. But if we’re talking about the value “1” that is conceptually part of the computer’s currently executing state, then it’s really abstract just like 27 was. In this book, we’ll always be talking about bits in this second, abstract sense.

5. **Consider a bit in a computer’s memory. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Any kind of abstract object that has properties we can reason about. 

6. **If math isn’t just about numbers, what else is it about?**  
   Any kind of abstract object that has properties we can reason about.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 15

Context: # Chapter 2

## Sets

The place from which we'll start our walk is a body of mathematics called "set theory." Set theory has an amazing property: it's so simple and applicable that almost all the rest of mathematics can be based on it! This is all the more remarkable because set theory itself came along pretty late in the game (as things go) — it was singlehandedly invented by one brilliant man, Georg Cantor, in the 1870s. That may seem like a long time ago, but consider that by the time Cantor was born, mankind had already accumulated an immense wealth of mathematical knowledge: everything from geometry to algebra to calculus to prime numbers. Set theory was so elegant and universal, though, that after it was invented, nearly everything in math was redefined from the ground up to be couched in the language of sets. It turns out that this simple tool is an amazingly powerful way to reason about mathematical concepts of all flavors. Everything else in this book stands on set theory as a foundation.

Cantor, by the way, went insane as he tried to extend set theory to fully encompass the concept of infinity. Don't let that happen to you!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 18

Context: # CHAPTER 2. SETS

Set theory? Fuzzy sets change this membership assumption: items can indeed be “partially in” a set. One could declare, for instance, that Dad is “100% female,” which means he’s only 100% in the \( F \) set. That might or might not make sense for gender, but you can imagine that if we defined a set \( T \) of “the tall people,” such a notion might be useful. At any rate, this example illustrates a larger principle which is important to understand: in math, things are the way they are simply because we’ve decided it’s useful to think of them that way. If we decide there’s a different useful way to think about them, we can define new assumptions and proceed according to new rules. It doesn’t make any sense to say “sets are (or aren’t) really fuzzy”: because there is no “really.” All mathematics proceeds from whatever mathematicians have decided is useful to define, and any of it can be changed at any time if we see fit.

## 2.2 Defining sets

There are two ways to define a set: extensionally and intensionally. I’m not saying there are two kinds of sets; rather, there are simply two ways to specify a set.

To define a set extensionally is to list its actual members. That’s what we did when we said \( P = \{ Dad, Mom \} \) above. In this case, we’re not giving any “meaning” to the set; we’re just mechanically spelling out what’s in it. The elements Dad and Mom are called the **extension** of the set \( P \).

The other way to specify a set is intensionally, which means to describe its meaning. Another way to think of this is specifying a rule by which it can be determined whether or not a given element is in the set. If I say “Let \( P \) be the set of all parents,” I am defining \( P \) intensionally. I haven’t explicitly said which specific elements of the set are in \( P \). I’ve just given the meaning of the set, from which you can figure out the extension. We call “parent-ness” the **intension** of \( P \).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 21

Context: 2.4 Sets are not arrays
=======================

If you’ve done some computer programming, you might see a resemblance between sets and the collections of items often used in a program: arrays, perhaps, or linked lists. To be sure, there are some similarities. But there are also some very important differences, which must not be overlooked:

- **No order.** As previously mentioned, there is no order to the members of a set. “{Mom, Dad}” is the same set as “{Dad, Mom}”. In a computer program, of course, most arrays or lists have first, second, and last elements, and an index number assigned to each.

- **No duplicates.** Suppose \( M \) is the set of all males. What would it possibly mean to say \( M = \{T.J., T.J., Johnny\} \)? Would that mean that “T.J. is twice the man that Johnny is”? This is obviously nonsensical. The set \( M \) is based on a property: maleness. Each element of \( M \) is either male, or it isn't. It can't be “male three times.” Again, in an array or linked list, you could certainly have more than one copy of the same item in different positions.

- **Infinite sets.** 'Nuff said. I’ve never seen an array with infinitely many elements, and neither will you.

- **Untyped.** Most of the time, an array or other collection in a computer program contains elements of only a single type; it’s an array of integers, or a linked list of Customer objects, for example. This is important because the program often needs to treat all elements in the collection the same way. Perhaps it needs to loop over the array to add up all the numbers, or iterate through a linked list and search for customers who have not placed an order in the last six months.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 23

Context: # 2.6 Sets of Sets

Sets are heterogeneous — a single set can contain four universities, seven integers, and an image — and so it might occur to you that they can contain other sets as well. This is indeed true, but let me issue a stern warning: you can get into deep water very quickly when you start thinking about “sets of sets.” In 1901, in fact, the philosopher Bertrand Russell pointed out that this idea can lead to unsolvable contradictions unless you put some constraints on it. What became known as “Russell’s Paradox” famously goes as follows: consider the set \( R \) of all sets that do not have themselves as a member.

For this reason, we’ll be very careful to use curly braces to denote sets, and parentheses to denote ordered pairs.

By the way, although the word “coordinate” is often used to describe the elements of an ordered pair, that’s really a geometry-centric word that implies a visual plot of some kind. Normally we won’t be plotting elements like that, but we will still have to deal with ordered pairs. I’ll just call the constituent parts “elements” to make it more general.

Three-dimensional points need ordered triples \( (x, y, z) \), and it doesn’t take a rocket scientist to deduce that we could extend this to any number of elements. The question is what to call them, and you do sort of sound like a rocket scientist (or other generic nerd) when you say tuple. Some people rhyme this word with “Drupal,” and others with “couple,” by the way, and there seems to be no consensus. If you have an ordered pair-type thing with 5 elements, therefore, it’s a 5-tuple (or a quintuple). If it has 117 elements, it’s a 117-tuple, and there’s really nothing else to call it. The general term (if we don’t know or want to specify how many elements) is n-tuple. In any case, it’s an ordered sequence of elements that may contain duplicates, so it’s very different than a set.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 24

Context: # CHAPTER 2. SETS

as members³. Now is \( R \) a member of itself, or isn’t it? Either way you answer turns out to be wrong (try it!) which means that this whole setup must be flawed at some level.

The good news is that as long as you don’t deal with this kind of self-referential loop (“containing yourself as a member”) then it’s pretty safe to try at home. Consider this set:

\[
V = \{ 3, 5, \{ 5, 4 \}, 2 \}
\]

This set has four (not five) members. Three of \( V \)’s members are integers: 2, 3, and 5. The other one is a set (with no name given). That other set, by the way, has two members of its own: 4 and 5. If you were asked, “is \( 4 \in V \)?” the answer would be no.

As a corollary to this, there’s a difference between

\[
\emptyset
\]

and 

\[
\{ \emptyset \}
\]

The former is a set with no elements. The latter is a set with one element: and that element just happens to be a set with nothing in it.

## 2.7 Cardinality

When we talk about the number of elements in a set, we use the word cardinality. You’d think we could just call it the “size” of the set, but mathematicians sometimes like words that sound cool. The cardinality of \( M \) (the set of males, where the Davies family is the domain of discourse) is 3, because there are three elements in it. The cardinality of the empty set \( \emptyset \) is 0. The cardinality of the set of all integers is \( \aleph_0 \). Simple as that.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 25

Context: 2.8 Some special sets
=====================

In addition to the empty set, there are symbols for some other common sets, including:

- **Z** — the integers (positive, negative, and zero)
- **N** — the natural numbers (positive integers and zero)
- **Q** — the rational numbers (all numbers that can be expressed as an integer divided by another integer)
- **R** — the real numbers (all numbers that aren't imaginary, even decimal numbers that aren't rational)

The cardinality of all these sets is infinity, although, as I alluded to previously, \(\mathbb{R}\) is in some sense "greater than" \(\mathbb{N}\). For the curious, we say that \(\mathbb{N}\) is a countably infinite set, whereas \(\mathbb{R}\) is uncountably infinite. Speaking very loosely, this can be thought of this way: if we start counting up all the natural numbers, 0, 1, 2, 3, 4, ..., we will never get to the end of them. But at least we can start counting. With the real numbers, we can't even get off the ground. Where do you begin? Starting with 0 is fine, but then what's the "next" real number? Choosing anything for your second number inevitably skips a lot in between. Once you've digested this, I'll spring another shocking truth on you: \(|\mathbb{Q}| \text{ is actually equal to } |\mathbb{R}|, \text{ not greater than it as } |\mathbb{R}|. \text{ Cantor came up with an ingenious numbering scheme whereby all the rational numbers — including 3, -9, \text{ and } \frac{1}{17} — can be listed off regularly, in order, just like the integers can. And so } |\mathbb{Q}| = |\mathbb{N}| \neq |\mathbb{R}|. \text{ This kind of stuff can blow your mind.}
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 27

Context: # 2.9. COMBINING SETS

\( B \) is “all the things *period* that aren't in \( B' \).” Of course, “all the things period” means “all the things that we're currently talking about.” The domain of discourse \( \Omega \) is very important here. If we're talking about the Davies family, we would say that \( M = \{ \text{Mom}, \text{Lizzy} \} \), because those are all the Davies who aren’t male. If, on the other hand, \( \Omega \) is “the grand set of absolutely everything,” then not only is Mom a member of \( M \), but so is the number 12, the French Revolution, and my nightmare last Tuesday about a rabid platypus.

- **Cartesian product** (\( \times \)). Looks like multiplication, but very different. When you take the Cartesian product of two sets \( A \) and \( B \), you don’t even get the elements from the sets in the result. Instead, you get ordered pairs of elements. These ordered pairs represent each combination of an element from \( A \) and an element from \( B \). For instance, suppose \( A = \{ \text{Bob}, \text{Dave} \} \) and \( B = \{ \text{Jenny}, \text{Gabrielle}, \text{Tiffany} \} \). Then:

\[
A \times B = \{ (\text{Bob}, \text{Jenny}), (\text{Bob}, \text{Gabrielle}), (\text{Bob}, \text{Tiffany}), (\text{Dave}, \text{Jenny}), (\text{Dave}, \text{Gabrielle}), (\text{Dave}, \text{Tiffany}) \}
\]

Study that list. The first thing to realize is that it consists of neither guys nor girls, but of ordered pairs. (Clearly, for example, \( \text{Jenny} \in A \times B \).) Every guy appears exactly once with every girl, and the guy is always the first element of the ordered pair. Since we have two guys and three girls, there are six elements in the result, which is an easy way to remember the \( x \) sign that represents Cartesian product. (Do not worry, make the common mistake of thinking that \( A \times B \) is \( A \) or \( B \) as a set, not a number. The cardinality of the set, of course, is \( 6 \), so it's appropriate to write \( |A \times B| = 6 \).)

## Laws of Combining Sets

There are a bunch of handy facts that arise when combining sets using the above operators. The important thing is that these are all easily seen just by thinking about them for a moment. Put another way, these aren’t facts to memorize; they’re facts to look at and see.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 30

Context: # CHAPTER 2. SETS

Let \( X \) be the set of all known murderers. Let \( Y \) be the set of all known thieves. 

Again, it's best understood with a specific example. Let’s say you’re renting a house and want to make sure you don’t have any shady characters under the roof. 

Now, as a landlord, you don’t want any thieves or murderers renting your property. So who are you willing to rent to? Answer: if \( \Omega \) is the set of all people, you are willing to rent to \( X \cup Y \).

Why that? Because if you take \( X \cup Y \), that gives you all the undesirables: people who are either murderers or thieves (or both). You don’t want to rent to any of them. In fact, you want to rent to the complement of that set; namely, “anybody else.” Putting an overbar on that expression gives you all the non-thieves and non-murderers.

Very well. But now look at the right-hand side of the equation. \( X \) gives you the non-thieves. \( Y \) gives you the non-murderers. Now in order to get acceptable people, you want to rent only to someone who’s in both groups. Put another way, they have to be both a non-thief and a non-murderer in order for you to rent to them. Therefore, they must be in the intersection of the non-thief group and the non-murderer group. Therefore, the two sides of this equation are the same.

The other form of De Morgan’s law is stated by flipping the intersections and unions:

\[
X \cap Y = \overline{X \cup Y}
\]

Work this one out for yourself using a similar example, and convince yourself it’s always true.

Augustus De Morgan, by the way, was a brilliant 19th century mathematician with a wide range of interests. His name
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 31

Context: 2.10 Subsets
===========

We learned that the “∈” symbol is used to indicate set membership: the element on the left is a member of the set on the right. A related but distinct notion is the idea of a subset. When we say \( X \subset Y \) (pronounced “X is a subset of Y”), it means that every member of \( X \) is also a member of \( Y \). The reverse is not necessarily true, of course; otherwise “⊆” would just mean “=”. So if \( A = \{ Dad, Lizzy \} \) and \( K = \{ Dad, Mom, Lizzy \} \), then we can say \( A \subset K \).

Be careful about the distinction between “∈” and “⊆”, which are often confused. With “∈”, the thing on the left is an element, whereas with “⊆”, the thing on the left is a set. This is further complicated by the fact that the element on the left-hand side of “∈” might well be a set.

Let's see some examples. Suppose \( Q \) is the set \( \{ 4, 9, 4 \} \).  \( Q \) has three elements here, one of which is itself a set. Now suppose that we let \( P \) be the set \( \{ 4, 9 \} \). Question: Is \( P \subset Q \)? The answer is yes: the set \( \{ 4, 9 \} \) (which is the same as the set \( \{ 9, 4 \} \), just written in a different way) is in fact an element of the set \( Q \). Next question: Is \( P \subset Q \)? The answer is no, \( P \nsubseteq Q \). If there are two of them (4 and 4) also in \( Q \), then 4 is a member of \( Q \), not 9. Last question: If \( R \) is defined to be \( \{ 2, 4 \} \), is \( R \subset Q \)? The answer is yes, since both 2 and 4 are members of \( Q \).

Notice that the definition here is a subset of itself. Sometimes, though, it’s useful to talk about whether a set is really a subset of another, and you don’t want to “count” if the two sets are actually equal. This is called a proper subset, and the symbol for it is \( \subsetneq \). You can see the rationale for the choice of symbol, because “⊆” is kind of like “<” for numbers, and “⊂” is like “<”.

Every set is a subset (not necessarily a proper one) of itself, because
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 33

Context: ```markdown
## 2.11. POWER SETS

Don't lose sight. There are four elements to the power set of \( A \), each of which is one of the possible subsets. It might seem strange to talk about "all of the possible subsets"—when I first learned this stuff, I remember thinking at first that there would be no limit to the number of subsets you could make from a set. But of course there is. To create a subset, you can either include, or exclude, each one of the original set's members. In \( A \)’s case, you can either (1) include both Dad and Lizzy, or (2) include Dad but not Lizzy, or (3) include Lizzy but not Dad, or (4) exclude both, in which case your subset is empty. Therefore, \( P(A) \) includes all four of those subsets.

Now what's the cardinality of \( P(X) \) for some set \( X \)? That's an interesting question, and one well worth pondering. The answer ripples through the heart of a lot of combinatorics and the binary number system, topics we'll cover later. And the answer is right at our fingertips if we just extrapolate from the previous example. To form a subset of \( X \), we have a choice to either include, or else exclude, each of its elements. So there's two choices for the first element, and then whether we choose to include or exclude that first element, there are two choices for the second, etc. 

So if \( |X| = 2 \) (recall that this notation means \( X \) has two elements) or \( X \) has a cardinality of \( 2^n \), then its power set has \( 2 \times 2 \) members. If \( |X| = 3 \), then its power set has \( 2 \times 2 \times 2 \) members. In general:

\[
|P(X)| = 2^{|X|}.
\]

As a limiting case (and a brain-bender), notice that if \( X \) is the empty set, then \( P(X) \) has one (not zero) members, because there is in fact one subset of the empty set: namely, the empty set itself. So \( |X| = 0 \), and \( |P(X)| = 1 \). And that jives with the above formula.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 35

Context: # 2.12. PARTITIONS

All of these are ways of dividing up the Davies family into groups so that no one is in more than one group, and everyone is in some group. The following is **not** a partition:

- `{ Mom, Lizzy, T.J. }`, and `{ Dad }`

because it leaves out Johnny. This, too, is **not** a partition:

- `{ Dad }`, `{ Mom, T.J. }`, and `{ Johnny, Lizzy, Dad }`

because Dad appears in two of the subsets.

By the way, realize that every set \( S \) together with its (total) complement \( S' \) forms a partition of the entire domain of discourse \( \Omega \). This is because every element either is, or is not, in any given set. The set of males and non-males are a partition of \( \Omega \) because everything is either a male or a non-male, and never both (inanimate objects and other nouns are non-males, just as women are). The set of prime numbers and the set of everything-except-prime-numbers are a partition. The set of underdone cheeseburgers and the set of everything-except-underdone-cheeseburgers form a partition of \( \Omega \). By pure logic, this is true no matter what the set is.

You might wonder why partitions are an important concept. The answer is that they come up quite a bit, and when they do, we can make some important simplifications. Take \( S \), the set of all students at UMW. We can partition it in several different ways. If we divide \( S \) into the set of freshmen, sophomores, juniors, and seniors, we have a partition: every student is one of those grade levels, and no student is more than one. If we group them into in-state and out-of-state students, we again have a partition.

Note that dividing \( S \) into computer science majors and English majors does not give us a partition. For one thing, not everyone is majoring in one of those two subjects. For another, some students are majoring in both.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 39

Context: # 2.13. EXERCISES

21. What’s \( A \cap B \)?  
   \{ Macbeth, Hamlet \}.

22. What’s \( B \cap T \)?  
   ∅.

23. What’s \( B \cap \overline{T} \)?  
   B. (which is { Scrabble, Monopoly, Othello. })

24. What’s \( A \cup (B \cap T) \)?  
   \{ Hamlet, Othello, Macbeth \}.

25. What’s \( (A \cup B) \cap T \)?  
   \{ Macbeth, Hamlet \}.

26. What’s \( A \setminus B \)?  
   Simply \( T \), since these two sets have nothing in common.

27. What’s \( T' \setminus B \)?  
   \{ (Hamlet, Macbeth), (Hamlet, Hamlet), (Hamlet, Othello), (Village, Macbeth), (Village, Hamlet), (Village, Othello), (Town, Hamlet), (Town, Othello) \}. The order of the ordered pairs within the set is not important; the order of the elements within each ordered pair is important.

28. What’s \( T \cap X \)?  
   0.

29. What’s \( (B \cap B) \times (A \cap T) \)?  
   \{ (Scrabble, Hamlet), (Monopoly, Hamlet), (Othello, Hamlet) \}.

30. What’s \( I \cup B \cup T \)?  
   7.

31. What’s \( I \cap (A \cap T) \)?  
   21. (The first parenthesized expression gives rise to a set with 7 elements, and the second to a set with three elements (B itself). Each element from the first set gets paired with an element from the second, so there are 21 such pairings.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 40

Context: # CHAPTER 2. SETS

### 33. Is A an extensional set, or an intensional set?
The question doesn't make sense. Sets aren't "extensional" or "intensional"; rather, a given set can be described extensionally or intensionally. The description given in item 19 is an extensional one; an intensional description of the same set would be "The Shakespeare tragedies Stephen studied in high school."

### 34. Recall that G was defined as { Matthew, Mark, Luke, John }. Is this a partition of G?
- { Luke, Matthew }
- { John }

No, because the sets are not collectively exhaustive (Mark is missing).

### 35. Is this a partition of G?
- { Mark, Luke }
- { Matthew, Luke }

No, because the sets are neither collectively exhaustive (John is missing) nor mutually exclusive (Luke appears in two of them).

### 36. Is this a partition of G?
- { Matthew, Mark, Luke }
- { John }

Yes. (Trivia: this partitions the elements into the synoptic gospels and the non-synoptic gospels).

### 37. Is this a partition of G?
- { Matthew, Luke }
- { John, Mark }

Yes. (This partitions the elements into the gospels which feature a Christmas story and those that don't).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 43

Context: # Chapter 3

## Relations

Sets are fundamental to discrete math, both for what they represent in themselves and for how they can be combined to produce other sets. In this chapter, we’re going to learn a new way of combining sets, called relations.

### 3.1 The Idea of a Relation

A relation between a set \( X \) and \( Y \) is a subset of the Cartesian product. That one sentence packs in a whole heck of a lot, so spend a moment thinking deeply about it. Recall that \( X \times Y \) yields a set of ordered pairs, one for each combination of an element from \( X \) and an element from \( Y \). If \( X \) has 5 elements and \( Y \) has 4, then \( X \times Y \) is a set of 20 ordered pairs. To make it concrete, if \( X \) is the set \{ Harry, Ron, Hermione \} and \( Y \) is the set \{ Dr. Pepper, Mt. Dew \}, then \( X \times Y \) is:

- \( (Harry, Dr. Pepper) \)
- \( (Harry, Mt. Dew) \)
- \( (Ron, Dr. Pepper) \)
- \( (Ron, Mt. Dew) \)
- \( (Hermione, Dr. Pepper) \)
- \( (Hermione, Mt. Dew) \)

Convince yourself that every possible combination is in there. I listed them out methodically to make sure I didn’t miss any (all the Harry’s first, with each drink in order, then all the Ron’s, etc.), but of course there’s no order to the members of a set, so I could have listed them in any order.

Now if I define a relation between \( X \) and \( Y \), I’m simply specifying that certain of these ordered pairs are in the relation, and certain
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 51

Context: # 3.5. PROPERTIES OF ENDORLEATIONS

It’s not transitive. Remember: to meet any of these properties, they have to fully apply. “Almost” only counts in horseshoes.

Let’s try another example:

- (Ron, Harry)
- (Ron, Ron)
- (Harry, Harry)
- (Hermione, Hermione)
- (Harry, Hermione)
- (Hermione, Harry)

Is this reflexive? Yes. We’ve got all three wizards appearing with themselves. 

Is it symmetric? No, since (Ron, Harry) has no match. 

Is it antisymmetric? No, since (Harry, Hermione) does have a match. 

Is it transitive? No, since the presence of (Ron, Harry) and (Harry, Hermione) implies the necessity of (Ron, Hermione), which doesn’t appear, so no dice.

## Partial Orders and Posets

A couple of other fun terms: an endorelation which is (1) reflexive, (2) antisymmetric, and (3) transitive is called a **partial order**. A set together with a partial order is called a **partially ordered set**, or "poset" for short. The name "partial order" makes sense once you think through an example.

You may have noticed that when dogs meet each other (especially male dogs), they often circle each other and take stock of each other and try to establish dominance through the so-called "alpha dog." This is a pecking order of sorts that many different species establish. Now suppose I have the set D of all dogs, and a relation “IsAtLeastAs-ToughAs” between them. The relation starts off with every reflexive pair in it: (Rex, Rex), (Fido, Fido), etc. This is because obviously every dog is at least as tough as itself. Now every time two dogs x and y encounter each other, they establish dominance through eye contact or physical intimidation, and then one of the following ordered pairs is added to the relation: either (x, y) or (y, x), but never both.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 60

Context: # CHAPTER 3. RELATIONS

own wizard, and no soft drinks (or wizards) are left out. How exciting! This is a perfectly bijective function, also called a **bijection**. Again, the only way to get a bijection is for the domain and codomain to be the same size (although that alone does not guarantee a bijection; witness \( f \), above). Also observe that if they are the same size, then injectivity and surjectivity go hand-in-hand. Violate one, and you’re bound to violate the other. Uphold the one, and you’re bound to uphold the other. There’s a nice, pleasing, symmetrical elegance to the whole idea.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 65

Context: # 3.8. EXERCISES 

29. **Is faveSport injective?**  
   No, because Julie and Sam (and Chuck) all map to the same value (basketball). For a function to be injective, there must be no two domain elements that map to the same codomain element.

30. **Is there any way to make it injective?**  
   Not without altering the underlying sets. There are three athletes and two sports, so we can’t help but map multiple athletes to the same sport.

31. **Fine. Is faveSport surjective?**  
   No, because no one maps to volleyball.

32. **Is there any way to make it surjective?**  
   Sure, for instance, change Sam from basketball to volleyball. Now both of the codomain elements are "reachable" by some domain element, so it’s surjective.

33. **Is faveSport now also bijective?**  
   No, because it’s still not injective.

34. **How can we alter things so that it’s bijective?**  
   One way is to add a third sport—say, kickboxing—and move either Julie or Chuck over to kickboxing. If we have Julie map to kickboxing, Sam map to volleyball, and Chuck map to basketball, we have a bijection.

35. **How do we normally write the fact that "Julie maps to kickboxing"?**  
   faveSport(Julie) = kickboxing.

36. **What’s another name for "injective"?**  
   One-to-one.

37. **What’s another name for "surjective"?**  
   Onto.

38. **What’s another name for "range"?**  
   Image.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 67

Context: # Chapter 4

## Probability

Probability is the study of **uncertainty**. This may seem like a hopeless endeavor, sort of like knowing the unknowable, but it's not. The study of probability gives us tools for taming the uncertain world we live and program in, and for reasoning about it in a precise and helpful way.

We may not know exactly how long a particular visitor is willing to wait for our webpage to load in their browser, but we can use probability to estimate how much traffic we’ll lose if this takes longer than a certain average duration. We may not know which specific passwords a hacker will try as he attempts to break our security protocol, but we can use probability to estimate how feasible this approach will be for him. We may not know exactly when a certain program will run out of RAM and have to swap its data out to virtual memory, but we can predict how often this is likely to occur — and how painful it will be for us — given a certain system load and user behavior.

The trick is to use the tools we've already built — sets, relations, functions — to characterize and structure our notions of the relative likelihood of various outcomes. Once those underpinnings are secured, a layer of deductive reasoning will help us make good use of that information to begin to predict the future.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 69

Context: # 4.2 Probability Measures

Okay, we've defined sample spaces and events, but when do quantitative notions like "the odds of" and "percent chance" come into play? They enter the scene when we define a **probability measure**. A probability measure is simply a function from the domain of events to the codomain of real numbers. We'll normally use the letters \( P \) for our probability measure. In symbols, \( P : \mathcal{P}(\Omega) \to \mathbb{R} \) (since the set of all events is the power set of the sample space, as per above). There's actually another constraint, though, which is that \( P \)'s values must be in the range of \( [0, 1] \), inclusive. So it's more correct to write: \( P : \mathcal{P}(\Omega) \to [0, 1] \). (You may recall from a previous math course that \( (] \) and \( [ \) are used to describe a closed interval in which the endpoints are included in the interval.)

The "meaning" of the probability measure is intuitive enough: it indicates how likely we think each event is to occur. In the baby example, if we say \( P(\text{boy}) = 0.5 \), it means there's a 50% probability (a.k.a., a 50% chance) that a male child will be born. In the game example, if we say \( P(\text{M}) = 0.667 \), it means there's a two-thirds chance that my specific outcome is only whether I beat a 2. So I could define the event \( H \) ("first") to be the set \( \{ 3, 4, 5, 6 \} \). I could define the event \( I \) ("first") to be the set \( \{ 1 \} \) (notice \( H \) is still a set, even though it has only one element). Then I could define the event \( T \) ("tie") as the set \( \{ 2 \} \). I now effectively collapsed a larger set of outcomes into only the groups of outcomes I'm interested in. Now I'm all ready to reason about the likelihood that each of these events actually occurs.

By the way, "the set of all outcomes" is simply \( \Omega \), since an outcome is an element of \( \Omega \). But an event is a subset of \( \Omega \), not a single element. What, then, is "the set of all events?" If you think it through, you'll realize that it's \( \mathcal{P}(\Omega) \) (the power set of the sample space). Put another way, when defining an event, I can choose any subset of the possible outcomes, and so I can choose any set from \( \mathcal{P}(\Omega) \).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 72

Context: # CHAPTER 4. PROBABILITY

![Figure 4.1: Various events, and their overlap.](image_link_here)

Very well then. What we’re really asking here is “what’s \( Pr(U \cup R) \)?” We want the union of the two events, since we’re asking for the probability that either (or both) of them occurs. You might first think that we’d add the two probabilities for the two events and be done with it, but a glance at the diagram tells you this means trouble. \( Pr(U) \) is .4, and \( Pr(R) \) is .7. Even if we weren’t very smart, we’d know something was wrong as soon as we added .4 + .7 = 1.1 to get a probability of over 1 and violate rule 1. But we are smart, and looking at the diagram it’s easy to see what happened: we double-counted Kelly's probability. Kelly was a member of both groups, so her .2 got counted in there twice. Now you can see the rationale for rule 3. To get \( Pr(U \cup R) \) we add \( Pr(U) \) and \( Pr(R) \), but then we have to subtract back out the part we double-counted. And what did we double-count? Precisely the intersection \( U \cap R \).

As a second example, suppose we want the probability of an underdog to emerge a winner? \( Pr(U) = .4 \), and \( Pr(F) = .5 \), so the first step is to just add these. Then we subtract out the intersection, which we double counted. In this case, the intersection \( U \cap F \) is just \( L \) (check the diagram), and so subtract out the whole \( L \). The answer is .5, as it should be.

By the way, you’ll notice that if the two sets in question are mutually exclusive...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 74

Context: # 4. PROBABILITY

It’s the size (cardinality) of the event set that matters, and the ratio of this number to the total number of events is the probability. For example, if we deal from a fair deck, the probability of drawing a face card is:

Pr(F) = \(\frac{|F|}{N}\)

= \(\frac{|\{K♠, K♥, K♦, K♣, \ldots, J♠\}|}{52}\)

= \(\frac{12}{52} = 0.231\)

Please realize that this shortcut **only** applies when the probability of each outcome is the same. We certainly couldn’t say, for example, that the probability of a user’s password starting with the letter q is just \(\frac{1}{26}\) because passwords surely don’t contain all letters with equal frequency. (At least, I’d be very surprised if that were the case.) The only way to solve a problem like this is to know how often each letter of the alphabet occurs.

## 4.3 Philosophical Interlude

Which brings me to an important question. How do we get these probability numbers, anyway? Everything so far has assumed that the numbers have been dropped into our lap.

The answer depends somewhat on your interpretation of what probability means. If we say *“the probability of getting heads on a coin flip is 0.5,”* what are we really saying? There have traditionally been two opposing answers to this question, called the frequentist view and the Bayesian view. It’s interesting to compare their claims.

The frequentist view is that we derive probabilities by simply running many trials and counting the results. The proportions of various outcomes yield a good idea of their probabilities, particularly if the sample size is large. Consider flipping a coin. If we flip a coin ten times and count three heads, we might not have a great
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 81

Context: # 4.5 TOTAL PROBABILITY

Pr(A) = Pr(A ∩ C₁) + Pr(A ∩ C₂) + ... + Pr(A ∩ C_N)  
= Pr(A|C₁)Pr(C₁) + Pr(A|C₂)Pr(C₂) + ... + Pr(A|C_N)Pr(C_N)  
= ∑ᵏ₌₁ Pr(A|Cₖ)Pr(Cₖ)  

is the formula.¹

Let's take an example of this approach. Suppose that as part of a promotion for Muvico Cinemas movie theatre, we're planning to give a door prize to the 1000th customer this Saturday afternoon. We want to know, though, the probability that this person will be a minor. Figuring out how many patrons overall will be under 18 might be difficult. But suppose we're showing these three films on Saturday: *Spider-Man: No Way Home*, *Here Before*, and *Sonic the Hedgehog 2*. We can estimate the fraction of each movie's viewers that will be minors: .6, .01, and .95, respectively. We can also predict how many tickets will be sold for each film: 2,000 for *Spider-Man*, 500 for *Here Before*, and 1,000 for *Sonic*.

Applying frequentist principles, we can compute the probability that a particular visitor will be seeing each of the movies:

- \(Pr(Spider-Man) = \frac{2000}{2000 + 500 + 1000} = 0.571\)

- \(Pr(Here \ Before) = \frac{500}{2000 + 500 + 1000} = 0.143\)

- \(Pr(Sonic) = \frac{1500}{2000 + 500 + 1000} = 0.286\)

¹ If you're not familiar with the notation in that last line, realize that \(Σ\) (a capital Greek "sigma") just represents a sort of loop with a counter. The \(k\) in the sign means that the counter is \(k\) and starts at 1. The "N" above the line means the counter goes up to \(N\), which is its last value. And what does the loop do? It adds up a cumulative sum. The thing being added to the total each time through the loop is the expression to the right of the sign. The last line with the \(Σ\) is just a more compact way of expressing the preceding line.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 85

Context: # 4.6. BAYES' THEOREM

See how that works? If I do have the disease (and there's a 1 in 1,000 chance of that), there's a .99 probability of me testing positive. On the other hand, if I don't have the disease (a 999 in 1,000 chance of that), there's a .01 probability of me testing positive anyway. The sum of those two mutually exclusive probabilities is 0.01098.

Now we can use our Bayes’ Theorem formula to deduce:

\[ 
P(D|T) = \frac{P(T|D) P(D)}{P(T)} 
\]

\[
P(D|T) = \frac{.99 \cdot \frac{1}{1000}}{0.01098} \approx .0902 
\]

Wow. We tested positive on a 99% accurate medical exam, yet we only have about a 9% chance of actually having the disease! Great news for the patient, but a head-scratcher for the math student. How can we understand this? Well, the key is to look back at that Total Probability calculation in equation 4.1. Remember that there were two ways to test positive: one where you had the disease, and one where you didn't. Look at the contribution to the whole that each of those two probabilities produced. The first was 0.00099, and the second was 0.9999, over ten times higher. Why? Simply because the disease is so rare. Think about it: the test fails once every hundred times, but a random person only has the disease once every thousand times. If you test positive, it’s far more likely that the test screwed up than that you actually have the disease, which is rarer than blue moons.

Anyway, all of this about diseases and tests is a side note. The main point is that Bayes' Theorem allows us to recast a search for \(P(X|Y)\) into a search for \(P(Y|X)\), which is often easier to find numbers for.

One of many computer science applications of Bayes' Theorem is in text mining. In this field, we computationally analyze the words in documents in order to automatically classify them or form summaries or conclusions about their contents. One goal might be to identify the true author of a document, given samples of the writing of various suspected authors. Consider the Federalist Papers,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 89

Context: # 4.7 INDEPENDENCE

Whoops!

One last point on the topic of independence: please don’t make the mistake of thinking that mutually exclusive events are independent! This is by no means the case, and, in fact, the opposite is true. If two events are mutually exclusive, they are extremely dependent on each other!

Consider the most trivial case: I choose a random person on campus, and define \( I \) as the event that they’re an in-state student, and \( O \) as the event that they’re out-of-state. Clearly these events are mutually exclusive. But are they independent? Of course not! Think about it: if I told you a person was in-state, would that tell you anything about whether they were out-of-state? Duh. In a mutual exclusive case like this, event \( I \) completely rules out \( O \) (and vice versa), which means that although \( \text{Pr}(I) \) might be .635, \( \text{Pr}(I|O) \) is a big fat zero. More generally, \( \text{Pr}(A|B) \) is most certainly not going to be equal to \( \text{Pr}(A) \) if the two events are mutually exclusive, because learning about one event tells you everything about the other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 93

Context: # Chapter 5

## Structures

Much of computer science deals with representing and manipulating information. To do this, people have devised various **structures** for organizing chunks of data in a way that makes it easy to store, search, and retrieve. There’s a whole course in most computer science curricula called “data structures” which covers how to implement these structures in code. In this book, we won’t be talking about the code, but rather the abstract structures themselves. This chapter has a lot of pictures in it, which depict examples of the various structures in a very general way. The concepts here map directly to code when you need to put them into practice.

There are all kinds of data structures — arrays, linked lists, queues, stacks, hashtables, and heaps, to name a few — but they almost all boil down to one of two fundamental kinds of things: graphs, and trees. There are the two structures we’ll focus on in this chapter. A graph is just about the most general structure you can envision: a bunch of scattered data elements that are related to each other in some way. Almost every data structure imaginable can be recast as a type of graph. Trees are sort of a special case of graphs, but also sort of a topic in their own right, kind of like functions were a special type of relation, but also kind of different. A tree can be seen as a type of graph that imposes extra special conditions which give some navigational benefit.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 95

Context: # 5.1. GRAPHS

The edges, we have precisely the structure of a graph. Psychologists have given this another name: a semantic network. It is thought that the myriad of concepts you have committed to memory — Abraham Lincoln, and bar of soap, and my fall schedule, and perhaps millions of others — are all associated in your mind in a vast semantic network that links the related concepts together. When your mind recalls information, or deduces facts, or even drifts randomly in idle moments, it’s essentially traversing this graph along the various edges that exist between vertices.

That’s deep. But you don’t have to go near that deep to see the appearance of graph structures all throughout computer science. What’s MapQuest, if not a giant graph where the vertices are travelable locations and the edges are routes between them? What’s Facebook, if not a giant graph where the vertices are people and the edges are friendships? What’s the World Wide Web, if not a giant graph where the vertices are pages and the edges are hyperlinks? What’s the Internet, if not a giant graph where the vertices are computers or routers and the edges are communication links between them? This simple scheme of linked vertices is powerful enough to accommodate a whole host of applications, which is worth studying.

## Graph Terms

The study of graphs brings with it a whole bevy of new terms which are important to use precisely:

1. **Vertex.** Every graph contains zero or more vertices.¹ (These are also sometimes called nodes, concepts, or objects.)

2. **Edge.** Every graph contains zero or more edges. (These are also sometimes called links, connections, associations, or relationships.) Each edge connects exactly two vertices, unless the edge connects a vertex to itself, which is possible, believe it.

> ¹ The phrase "zero or more" is common in discrete math. In this case, it indicates that the empty graph, which contains no vertices at all, is still a legitimate graph.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 97

Context: # 5.1. GRAPHS

## Figure 5.2: A directed graph.

```
         Sonny Liston
           /      \
      Muhammad Ali  George Foreman
           \      /
         Joe Frazier
```

## Figure 5.3: A weighted (and undirected) graph.

```
        Washington, DC
              |
             50
              |
        Fredericksburg
             150
              |
             60
              |
        Virginia Beach
              |
             100
              |
          Richmond
```

**adjacent.** If two vertices have an edge between them, they are said to be adjacent.

**connected.** The word connected has two meanings: it applies both to pairs of vertices and to entire graphs.

We say that two vertices are connected if there is at least one path between them. Each vertex is therefore "reachable" from the other. In Figure 5.1, President and actor are connected, but Ford's Theatre and Civil War are not.

"Connected" is also used to describe entire graphs, if every node can be reached from all others. It's easy to see that Fig.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 102

Context: # CHAPTER 5. STRUCTURES

## Figure 5.7: A graph depicting a symmetric endorelationship.

```
(Harry, Ron)  
(Ron, Harry)  
(Ron, Hermione)  
(Hermione, Ron)  
(Harry, Harry)  
(Neville, Neville)
```

```
    Harry
      |
      |
    Ron
    / \
   /   \
Hermione  Neville
```

Notice how the loops (edges from a node back to itself) in these diagrams represent ordered pairs in which both elements are the same.

Another connection between graphs and sets has to do with partitions. Figure 5.7 was not a connected graph: Neville couldn't be reached from any of the other nodes. Now consider: isn't a graph like this similar in some ways to a partition of \( A \) — namely, this one?

- \( \{ Harry, Ron, Hermione \} \) 
- \( \{ Neville \} \)

We've simply partitioned the elements of \( A \) into the groups that are connected. If you remove the edge between Harry and Ron in that graph, you have:

- \( \{ Harry \} \) 
- \( \{ Ron, Hermione \} \) 
- \( \{ Neville \} \)

Then add one between Hermione and Neville, and you now have: 

- \( \{ Harry \} \)
- \( \{ Ron, Hermione, Neville \} \)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 103

Context: # 5.1 GRAPHS

{ Harry } and { Ron, Hermione, Neville }.

In other words, the “connectedness” of a graph can be represented precisely as a partition of the set of vertices. Each connected subset is in its own group, and every vertex is in one and only one group; therefore, these isolated groups are mutually exclusive and collectively exhaustive. Cool.

## Graph Traversal

If you had a long list — perhaps of phone numbers, names, or purchase orders — and you needed to go through and do something to each element of the list — dial all the numbers, scan the list for a certain name, and add up all the orders — it’d be pretty obvious how to do it. You just start at the top and work your way down. It might be tedious, but it’s not confusing.

Iterating through the elements like this is called **traversing** the data structure. You want to make sure you encounter each element once (and only once) so you can do whatever needs to be done with it. It’s clear how to traverse a list. But how to traverse a graph? There is no obvious “first” or “last” node, and each one is linked to potentially many others. And as we’ve seen, the vertices might not even be fully connected, so a traversal path through all the nodes might not even exist.

There are two different ways of traversing a graph: **breadth-first** and **depth-first**. They provide different ways of exploring the nodes, and as a side effect, each is able to discover whether the graph is connected or not. Let’s look at each in turn.

### Breadth-First Traversal

With breadth-first traversal, we begin at a starting vertex (it doesn’t matter which one) and explore the graph cautiously and delicately. We probe equally deep in all directions, making sure we’ve looked a little ways down each possible path before exploring each of those paths a little further.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 105

Context: # 5.1. GRAPHS

Now it might have occurred to you that we can run into trouble if we encounter the same node multiple times while we're traversing. This can happen if the graph has a cycle: there will be more than one path to reach some nodes, and we could get stuck in an infinite loop if we're not careful. For this reason, we introduce the concept of marking nodes. This is kind of like leaving a trail of breadcrumbs: if we're ever about to explore a node, but find out it's marked, then we know we've already been there, and it's pointless to search it again.

So there are two things we're going to do to nodes as we search:

- **To mark** a node means to remember that we've already encountered it in the process of our search.
- **To visit** a node means to actually do whatever it is we need to do to the node (call the phone number, examine its name for a pattern match, add the number to our total, whatever).

Now then, Breadth-first traversal (BFT) is an **algorithm**, which is just a step-by-step, reliable procedure that's guaranteed to produce a result. In this case, it's guaranteed to visit every node in the graph that's reachable from the starting node, and not get stuck in any infinite loops in the process. Here it is:

## Breadth-first traversal (BFT)

1. Choose a starting node.
2. Mark it and enqueue it on an empty queue.
3. While the queue is not empty, do these steps:
   a. Dequeue the front node of the queue.  
   b. Visit it.  
   c. Mark and enqueue all of its unmarked adjacent nodes (in any order).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 111

Context: # 5.1. GRAPHS

and is widely used today by, among other things, network routing protocols.

Consider Figure 5.12, a simplified map of France circa November 1944. Fresh U.S. troops are arriving by ship at the port town of Bordeaux, and need to reach Strasbourg as quickly as possible to assist the Allies in pushing Nazi squadrons back into Germany. The vertices of this graph are French cities, and the edge weights represent marching distances in kilometers. Although D-Day was successful, the outcome of the War may depend on how quickly these reinforcements can reach the front.

```
         350
     Lille ---- 500
      |           |
  300 |           | 400
      |           |
   Paris ----- Strasbourg
      |           |
  150 |           |
      |           |
   Nantes ---- 650
      |           
  200 |           
      |           
   Bordeaux             
      |           
  300 |           
      |           
 Toulouse ---- 400
      |           
  450 |           
      |           
   Lyon ----- 150
      |          
  250 |          
      |          
 Marseille ---- Nice
```

**Figure 5.12:** A weighted graph, through which we desire to find the shortest path from Bordeaux to Strasbourg.

The question, obviously, is which path the troops should take so as to reach Strasbourg the soonest. With a graph this small, you might be able to eyeball it. (Try it!) But Dijkstra's algorithm systematically considers every possible path and is guaranteed to find the one with the shortest total distance.

The way it works is to assign each node a tentative lowest distance, along with a tentative path from the start node to it. Then, if the algorithm encounters a different path to the same node as it progresses, it updates this tentative distance with the new, lower value.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 116

Context: # CHAPTER 5. STRUCTURES

That's immediately closest to the connected group, and adds it without a second thought. There's no notion of "perhaps I'll get a shorter overall edge set if I forego connecting this tempting close node right now."

Sometimes, a greedy algorithm turns out to give an optimal result. Often it does not, and more sophisticated approaches can find better solutions. In this case, it happens to work out that the greedy approach does work! Prim's algorithm will always find the set of edges that connects all the nodes and does so with the lowest possible total distance. It's amazing that it can do so, especially since it never backtracks or revises its opinion the way Dijkstra's algorithm does.

Let's follow the algorithm's progress in the WWII example. We can start with any node, so we'll pick Vichy just at random. Frame 1 of Figure 5.14 shows what happens when the algorithm begins with Vichy: we simply examine all its neighbors and connect the one that's closest to it. Nothing could be simpler. In this case, Lyon is a mere 250 km away, which is closer than anything else to Vichy, so we connect it and add the Vichy-Lyon edge to our edge set. The figure shows a heavy black line between Vichy and Lyon to show that it will officially be a supply line.

And so it goes. In successive frames, we add Marseille, Nice, and Briançon to the set of connected nodes, since we can do no better than 150 km, 150 km, and 200 km, respectively. Note carefully that in frame 4 we connect Briançon to Lyon—not to Nice—because 200 < 250. Note also that the algorithm can jump around from side to side—we aren't looking for the shortest edge from the most recently added node, but from any connected node.

The final result is shown in the last frame. This is the best way to connect all the cities to each other, if "best" means "least total supply line distance," which in this case works out to 2450 total kilometers. But if you look carefully, you'll notice a fascinating thing. 

>This network of edges does not contain the shortest path from Bordeaux to Strasbourg! I find that result dumbfounding. Wouldn't  
it's very easy to fall into a trance and always add nodes only to the ends of the growing scale. In fact, I originally did that with this very example!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 121

Context: ## 5.2. TREES

Instead, we would have a different rooted tree, depicted in the right half of the figure. Both of these rooted trees have all the same edges as the free tree did: B is connected to both A and C, F is connected only to A, etc. The only difference is which node is designated the root.

```
    A
   / \
  F   B
     / \
    C   D
         \
          E
```

```
    C
   / \
  B   D
 /     \
A       E
 \
  F
```

**Figure 5.16:** Two different rooted trees with the same vertices and edges.

Up to now we’ve said that the spatial positioning on graphs is irrelevant. But this changes a bit with rooted trees. Vertical positioning is our only way of showing which nodes are “above” others, and the word “above” does indeed have meaning here: it means closer to the root. The altitude of a node shows how many steps it is away from the root. In the right rooted tree, nodes B, D, and E are all one step away from the root (C), while node F is three steps away.

The key aspect to rooted trees — which is both their greatest advantage and greatest limitation — is that every node has one and only one path to the root. This behavior is inherited from free trees: as we noted, every node has only one path to every other.

Trees have a myriad of applications. Think of the files and folders on your hard drive: at the top is the root of the filesystem (perhaps “/” on Linux/Mac or “C:\” on Windows) and underneath that are named folders. Each folder can contain files as well as other named folders, and so on down the hierarchy. The result is that each file has one, and only one, distinct path to it from the top of the filesystem. The file can be stored, and later retrieved, in exactly one way.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 131

Context: # 5.2. TREES

anced: every level is completely filled. Figure 5.22 is not perfect, but it would be if we either added nodes to fill out level 4, or deleted the unfinished part of level 3 (as in Figure 5.23).

```
        G
      /   \
     K     H
    / \   / \
   D   M A   B
  / \   / \
 O   C E   I
        \
         F
          \
           N
```
Figure 5.23: A “perfect” binary tree.

Perfect binary trees obviously have the strictest size restrictions. It’s only possible, in fact, to have perfect binary trees with \( 2^{h} - 1 \) nodes, if \( h \) is the height of the tree. So there are perfect binary trees with 1, 3, 7, 15, 31, ... nodes, but none in between. In each such tree, \( 2^{h} \) of the nodes (almost exactly half) are leaves.

Now as we’ll see, binary trees can possess some pretty amazing powers if the nodes within them are organized in certain ways. Specifically, a binary search tree and a heap are two special kinds of binary trees that conform to specific constraints. In both cases, what makes them so powerful is the rate at which a tree grows as nodes are added to it.

Suppose we have a perfect binary tree. To make it concrete, let’s say it has height 3, which would give it \( 1+2+4+8=15 \) nodes, 8 of which are leaves. Now what happens if you increase the height of this tree to 4? If it’s still a “perfect” tree, you will have added 16 more nodes (all leaves). Thus, you have doubled the number of leaves by simply adding one more level. This cascades the more levels you add. A tree of height 5 doubles the number of leaves again (to 32), and height 6 doubles it again (to 64).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 135

Context: # 5.2 TREES

![Figure 5.25: NOT a binary search tree, though it looks like one at first glance. (Notice Nancy and Mitch)](path/to/image)

**Possible?** The trick is to realize that with every node you look at, you effectively eliminate half of the remaining tree from consideration. For instance, if we're looking for Molly, we can disregard Mitch's entire left half without even looking at it, then the same for Randi's entire right half. If you discard half of something, then half of the remaining half, it doesn’t take you long before you’ve eliminated almost every false lead.

There’s a formal way to describe this speedup, called **Big-O notation**. The subtleties are a bit complex, but the basic idea is this. When we say that an algorithm is **O(n)** (pronounced "oh-of-n"), it means that the time it takes to execute the algorithm is *proportional* to the number of nodes. This doesn’t imply any specific number of milliseconds or anything — that is highly dependent on the type of computer hardware, the programming language, and a myriad of other things. But what we can say about an **O(n)** algorithm is that if you double the number of nodes, you’re going to approximately double the running time. If you quadruple the number of nodes, you’re going to quadruple the running time. This is what you’d expect.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 144

Context: # CHAPTER 5. STRUCTURES

10. How many vertices and edges are there in the graph below?

   ![Graph](image_reference_placeholder)

   **7 and 10, respectively.**

11. What's the degree of vertex L?  
   **It has an in-degree of 2, and an out-degree of 1.**

12. Is this graph directed?  
   **Yes.**

13. Is this graph connected?  
   **Depends on what we mean. There are two different notions of “connectedness” for directed graphs. One is strongly connected, which means every vertex is reachable from any other by following the arrows in their specified directions. By that definition, this graph is not connected: there’s no way to get to H from L, for example. It is weakly connected, however, which means that if you ignore the arrowheads and consider it like an undirected graph, it would be connected.**

14. Is it a tree?  
   **No. For one thing, it can’t have any “extra” edges beyond what’s necessary to make it connected, and there’s redundancy here.**

15. Is it a DAG?  
   **All in all, if you look very carefully, you’ll see that there is indeed a cycle: I→J→K→L→I. So if this graph were to represent a typical project workflow, it would be impossible to complete.**

16. If we reversed the direction of the I→J edge, would it be a DAG?  
   **Yes. The edges could now be completed in this order: H, G, L, I, M, K, and finally J.**
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 145

Context: # 5.4. EXERCISES

17. If this graph represented an endorelation, how many ordered pairs would it have?  
   **Answer:** 10.

18. Suppose we traversed the graph below in depth-first fashion, starting with node P. In what order would we visit the nodes?

   ![Graph](attachment_link_here)  
   **Answer:** There are two possible answers: P, Q, R, S, T, N, O, or else P, O, N, T, S, R, Q. (The choice just depends on whether we go "left" or "right" initially.) Note in particular that either O or Q is at the very end of the list.

19. Now we traverse the same graph breadth-first fashion, starting with node P. Now in what order would we visit the nodes?

   **Answer:** Again, two possible answers: P, O, Q, N, R, T, S, or else P, Q, R, N, S, T. Note in particular that both O and Q are visited very early.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 149

Context: # Chapter 6

## Counting

If the title of this chapter seems less than inspiring, it’s only because the kind of counting we learned as children was mostly of a straightforward kind. In this chapter, we’re going to learn to answer some more difficult questions like “how many different semester schedules could a college student possibly have?” and “how many different passwords can a customer choose for this e-commerce website?” and “how likely is this network buffer to overflow, given that its packets are addressed to three different destinations?”

The more impressive-sounding name for this topic is **combinatorics**. In combinatorics, we focus on two tasks: counting things (to find out how many there are), and enumerating things (to systematically list them as individuals). Some things turn out to be hard to count but easy to enumerate, and vice versa.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 150

Context: # 6.1 The Fundamental Theorem

We start with a basic rule that goes by the audacious name of **The Fundamental Theorem of Counting**. It goes like this:

If a whole can be divided into \( k \) parts, and there’s \( n_i \) choices for the \( i \)-th part, then there’s \( n_1 \times n_2 \times n_3 \times \cdots \times n_k \) ways of doing the whole thing.

## Example

Jane is ordering a new Lamborghini. She has twelve different paint colors to choose from (including Luscious Red and Sassy Yellow), three different interiors (Premium Leather, Bonded Leather, or Vinyl), and three different stereo systems. She must also choose between automatic and manual transmission, and she can get power locks & windows (or not). How many different configurations does Jane have to choose from? 

To put it another way, how many different kinds of cars could come off the line for her?

The key is that every one of her choices is independent of all the others. Choosing an Envious Green exterior doesn't constrain her choice of transmission, stereo, or anything else. So no matter which of the 12 paint colors she chooses, she can independently choose any of the three interiors, and no matter what these first two choices are, she can freely choose any of the stereos, etc. It’s a mix-and-match. Therefore, the answer is:

\[ 
12 \times 3 \times 3 \times 2 = 432 
\]

Here’s an alternate notation you’ll run into for this, by the way:

**How many other "Fundamental Theorems" of math do you know?** Here are a few: the Fundamental Theorem of Arithmetic says that any natural number can be broken down into its prime factors in only one way. The Fundamental Theorem of Algebra says that the highest power of a polynomial is how many roots (zeros) it has. The Fundamental Theorem of Linear Algebra says that the row space and the column space of a matrix have the same dimension. The Fundamental Theorem of Calculus says that integration and differentiation are the inverses of each other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 151

Context: # 6.1. THE FUNDAMENTAL THEOREM

which is just a shorter way of writing

\[ n_1 \times n_2 \times n_3 \cdots \times n_k. \]

As mentioned in section 4.5, the \( \Sigma \) notation is essentially a loop with a counter, and it says to add up the expression to the right of it for each value of the counter. The \( \Pi \) notation is exactly the same, only instead of adding the expressions together for each value of the counter, we're multiplying them. (The reason mathematicians chose the symbols \( \Sigma \) (sigma) and \( \Pi \) (pi) for this, by the way, is that “sigma” and “pi” start with the same letter as “sum” and “product,” respectively.)

We can actually get a lot of leverage just with the fundamental theorem. How many different PINs are possible for an ATM card? There are four digits, each of which can be any value from 0 to 9 (ten total values), so the answer is:

\[ 10 \times 10 \times 10 \times 10 = 10,000 \text{ different PINs.} \]

So a thief at an ATM machine frantically entering PINs at random (hoping to break your account before you call and stop your debit card) would have to try about 5,000 of them on average before cracking the code.

What about middle school bullies who are trying to break into your locker? Well, most combination locks are opened by a three-number sequence, each number of which is anything from 0 to 39. So there are:

\[ 40 \times 40 \times 40 = 64,000 \text{ different combinations.} \]

That's probably slightly overstated, since I'll bet consecutive repeat numbers are not allowed (Master probably doesn't manufacture a
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 153

Context: # 6.1. THE FUNDAMENTAL THEOREM

Here’s an interesting thought experiment to test your intuition about numbers. Look at the above calculation, and ask yourself: “What if the state of Virginia decided, for purposes of consistency, that all license plates had to have the full 7 characters? Would that significantly reduce the total number of possible plates?” My first inclination would be to say “yes,” because we’re adding seven things in that equation, and if we mandated 7-character plates for everyone, we’d eliminate 6 out of the 7. Surely we’d be in danger of running out of license plates to give to all the cars! But in fact the new total number of plates would turn out to be:

\[ 36^7 = 78,364,164,096 \text{ plates}. \]

Wow. We’ve hardly lost anything by scrapping all the less-than-7-character plates. Turns out that in comparison with the 7-character plates, all the other lengths were a drop in the bucket. This is a powerful illustration of exponential growth. When you modify the exponent, going from something like \(36^6\) to \(36^7\), you get astronomically larger very quickly. This is a good thing to know when all you want is an approximation of some quantity. How many passwords are possible in a system that mandates 6–10 characters per password? Well, you can pretty much ignore all the 6-9 character passwords and just count the 10-character passwords, because there are so many more of those.

One last tweak to the license plate example before we move on. Suppose (again, for the sake of consistency) that Virginia outlawed personalized plates and gave everyone a randomly generated 7-character plate. Furthermore, the last four characters of the plate had to be digits instead of letters, so that something like “RFP-6YQ7” would be impossible. Now how many possible plates would there be?

In this case, not each of the \(k\) parts of \(n\) have an equal number of choices. \(n_1\) through \(n_3\) are still 36, but now \(n_4\) through \(n_7\) are just 10. So this gives us:

\[ 36 \times 36 \times 36 \times 10 \times 10 \times 10 \times 10 = 466,560,000 \text{ plates}. \]
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 156

Context: # CHAPTER 6. COUNTING

This pattern comes up so much that mathematicians have established a special notation for it:

\[ n \times (n - 1) \times (n - 2) \cdots \times 1 = n! \; (\text{``n-factorial''}) \]

We say there are “3-factorial” different brushing orders for the Davies kids. For our purposes, the notion of factorial will only apply for integers, so there’s no such thing as 23.46! or π! (In advanced computer science applications, however, mathematicians sometimes do define factorial for non-integers.) We also define 0! to be 1, which might surprise you.

This comes up a heck of a lot. If I give you a jumbled set of letters to unscramble, like “KRBS” (think of the Jumble® word game in the newspaper), how many different unscramblings are there? The answer is 5! or 120, one of which is BRISK. Let's say I shuffle a deck of cards before playing War. How many different games of War are there? The answer is 52!, since any of the cards in the deck might be shuffled on top, then any but that top card could be second, then any of those two could be third, etc. Ten packets arrive near-simultaneously at a network router. How many ways can they be queued up for transmission? 10! ways, just like a larger Davies family.

The factorial function grows really, really fast, by the way, even faster than exponential functions. A five letter word like “BRISK” has 120 permutations, but “AMBIENTDROUSLY” has 87,178,291,200, ten times the population of the earth. The number of ways to shuffle a deck is 

\[ 52! \approx 8.065 \times 10^{67} \]

so I don’t think my boys will end up playing the same War game twice any time soon, nor my wife and I the same bridge hand.

---

**“War”** is a mindless card game which involves no strategy or decision-making on the part of the players. Once you shuffle the initial deck, the entire outcome of the game is fixed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 159

Context: ```
# 6.2. PERMUTATIONS

## Example with 3-letter words

| R | S | K |
|---|---|---|
| R | K | S |
| R | S | I |
| I | R | S |
| I | S | K |
| I | K | R |
| S | R | I |
| S | I | K |
| ... | ... | ... |

Then, for the 5-letter word:

| B | R | I | S | K |
|---|---|---|---|---|
| B | R | I | K | S |
| B | S | K | I | R |
| B | K | I | S | R |
| B | K | S | I | R |
| B | I | R | S | K |
| B | I | K | R | S |
| ... | ... | ... | ... | ... |

## Partial permutations

Sometimes we want to count the permutations of a set, but only want to choose some of the items each time, not all of them. For example, consider a golf tournament in which the top ten finishers (out of 45) all receive prize money, with the first-place winner receiving the most, the second-place finisher a lesser amount, and so on down to tenth place, who receives a nominal prize. How many different finishes are possible to the tournament?

In this case, we want to know how many different orderings of golfers there are, but it turns out that past tenth place, we don’t care what order they finished in. All that matters is the first ten places. If the top ten are 1. Tiger, 2. Phil, 3. Lee, 4. Rory, ... and ...
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 161

Context: # 6.2. PERMUTATIONS

This is pronounced “n-to-the-k-falling,” and was invented by one of the most brilliant computer scientists in history, Donald Knuth.

To keep straight what \( n^k \) means, think of it as the same as plain exponentiation, except that the product diminishes instead of staying the same. For example, “17-to-the-6th” is

\[
17^6 = 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17
\]

but “17-to-the-6th-falling” is

\[
17^4 = 17 \cdot 16 \cdot 15 \cdot 14 \cdot 13 \cdot 12.
\]

In both cases, you’re multiplying the same number of terms; it’s just that in the second case, these terms are “falling.”

Anyway, notation aside, partial permutations abound in practice. A late night movie channel might show four classic films back to back every evening. If there are 500 films in the studio’s library, how many nightly TV schedules are possible? Answer: \( 500^4 \) since there are 500 choices of what to show at 7 pm, then 499 choices for 9 pm, 498 for 11 pm, and 497 for the 1 am late show.

The fastest 41 auto racers will qualify for Sunday’s race, and will be placed from Pole Position on down depending on their qualifying time. If 60 cars participate in the qualifying heat, then there are \( 60! \) different possible starting configurations for Sunday.

Middle schoolers entering sixth grade will be assigned a semester schedule that consists of five “blocks” (periods), each of which will have one of thirteen classes (science, math, orchestra, study hall, etc.). How many schedules are possible? You guessed it, \( 13^5 \). Notice that this is the correct answer only because no repeats are allowed: we don’t want to schedule any student for American History more than once. If a student could take the same class more than once in a day, then there would be \( 13^5 \) (not “falling”) different possible schedules.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 165

Context: ```
# 6.3. COMBINATIONS

The coefficients for this binomial are of course 1 and 1, since "n" really means "1 . n." Now if we multiply this by itself, we get:

$ (x + y) \cdot (x + y) = 2x^2 + 2xy + y^2 $,

the coefficients of the terms being 1, 2, and 1. We do it again:

$ (x^2 + 2xy + y^2) \cdot (x + y) = x^3 + 3x^2y + 3xy^2 $ 

to get 1, 3, 3, and 1, and do it again:

$ (x^3 + 3x^2y + 3xy^2 + y^3) \cdot (x + y) = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4 $ 

to get 1, 4, 6, and 4. At this point you might be having flashbacks to Pascal's triangle, which perhaps you learned about in grade school, in which each entry in a row is the sum of the two entries immediately above it (to the left and right), as in Figure 6.1. (If you never learned that, don't worry about it.)

```
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
```
**Figure 6.1:** The first six rows of Pascal's triangle.

Now you might be wondering where I'm going with this. What do fun algebra tricks have to do with counting combinations of items? The answer is that the values of $ \binom{n}{k} $ are precisely the coefficients of these multiplied polynomials. Let n be 4, which corresponds to the last polynomial we multiplied out. We can then compute all the combinations of items taken from a group of four:

- $ \binom{4}{0} = 1 $ 
- $ \binom{4}{1} = 4 $
- $ \binom{4}{2} = 6 $ 
- $ \binom{4}{3} = 4 $ 
- $ \binom{4}{4} = 1 $ 

In other words, there is exactly **one** way of taking no items out of 4 (you simply don't take any). There are **four** ways of taking one out of 4.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 168

Context: # CHAPTER 6. COUNTING

We each do a quick 30-minute workout session divided into six 5-minute blocks, and we work with one of the machines during each block, taking turns spotting each other. One day my friend asks me, “Hey Stephen, have you ever wondered: how many different workout routines are possible for us?”

I was, of course, wondering exactly that. But the correct answer turns out to hinge very delicately on exactly what “a workout routine” is. If we could select any weight machine for any 5-minute block, then the answer is \( 6^{18} \), since we have 18 choices for our first block, 18 choices for our second, and so on. (This comes to 34,012,224 different routines, if you’re interested.)

However, on further inspection, we might change our mind about this. Does it make sense to choose the same machine more than once in a 30-minute workout? Would we really complete a workout that consisted of “1. Biceps, 2. Abs, 3. Pec, 4. Biceps, 5. Biceps, 6. Biceps?” If not (and most trainers would probably recommend against such monomanical approaches to exercise), then the real answer is only \( 6^{18} \), since we have 18 choices for our first block, and then only 17 for the second, 16 for the third, etc. (This reduces the total to 13,360,080.)

But perhaps the phrase “a workout routine” means something different even than that. If I tell my physical therapist what “my workout routine” consisted of this morning, does he really care whether I did triceps first, last, or in the middle? He probably only cares about which machines (and therefore which muscle groups) I worked out that morning, not what order I did them in. If this is true, then our definition of a workout routine is somewhat different than before. It’s no longer a consecutive sequence of machines chosen, but rather a set of six machine choices. There would only \( \binom{6}{3} \) of those, or a mere 18,564. So as you can see, the answer radically depends on the precise interpretation of the concepts, which means that to successfully do combinatorics, you have to slow down and think very carefully.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 170

Context: # CHAPTER 6. COUNTING

1. Go back to when the child did have to choose something from each category, but now say they can choose any number of accessories (so they could have the wizard's cape, a batman helmet, plus a lightsaber, pipe, and scepter). Now how many costumes are there?

   This is \(4 \times 5 \times 2^n\), or a whopping 10,240 for those of you keeping score. The \(2^n\) changed to a \(2^n\) because now for each accessory, a costume might include it, or exclude it. That's two independent choices for each accessory.

2. Okay, that’s overkill. A kid only has two hands, after all, so handling nine accessories would be a disastrous challenge. Let’s say instead that a child can choose up to three accessories (but must have at least one). Now how many costume choices are there?

   Now it’s \(4 \times 5 \times \left( \binom{9}{1} + \binom{9}{2} + \binom{9}{3} \right)\), which is equal to \(4 \times 5 \times (9 + 36 + 84)\) or 2,580 possible costumes.

3. Ignoring the at-least-one-child-and-adult constraint for the moment, the total number of groups would seem to be \( \binom{7}{3} + \binom{7}{2} = 0 + 2380 + 6188 = 9,238 possible groups. But of course this is an overcount, since it includes groups with no children and groups with no adults. We’ll use the trick from p. 146 to subtract those out. Now how many size-3-to-5 groups with no adults (all kids) are there? \( \binom{3}{1} + \binom{3}{2} + \binom{3}{3} = 41\). Therefore, by the 14 trick, the total number of legal groups is 9248 - 957 - 41 = 8,250. Final answer.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 171

Context: # 6.5. EXERCISES

1. To encourage rivalry and gluttony, we're going to give a special certificate to the child who collects the most candy at the end of the night. And while we're at it, we'll give 2nd-place and 3rd-place certificates as well. How many different ways could our 1st-2nd-3rd contest turn out?  
   **Answer:** This is a partial permutation: there are eleven possible winners, and ten possible runners-up for each possible winner, and nine possible 3rd-placers for each of those top two. The answer is therefore 114, or 900. Wow! I wouldn’t have guessed that high.

2. Finally, what if we want every kid to get a certificate with their name and place-of-finish on it? How many possibilities? (Assume no ties.)  
   **Answer:** This is now a full-blown permutation: 111. It comes to 30,916,800 different orders-of-finish, believe it or not. I told you: this counting stuff can explode fast.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 175

Context: # 7.1. WHAT IS A “NUMBER?”

When you think of a number, I want you to try to erase the sequence of digits from your mind. Think of a number as what is: a **quantity**. Here's what the number seventeen really looks like:

```
   8
 8 8 8
   8
```

It’s just an **amount**. There are more circles in that picture than in some pictures, and less than in others. But in no way is it “two digits,” nor do the particular digits “1” and “7” come into play any more or less than any other digits.

Let’s keep thinking about this. Consider this number, which I’ll label “A”:

(A)

Now let’s add another circle to it, creating a different number I’ll call “B”:

(B)

And finally, we’ll do it one more time to get “C”:

(C)

(Look carefully at those images and convince yourself that I added one circle each time.)

When going from A to B, I added one circle. When going from B to C, I also added one circle. Now I ask you: was going from B to C any more “significant” than going from A to B? Did anything qualitatively different happen?

The answer is obviously no. Adding a circle is adding a circle; there’s nothing more to it than that. But if you had been writing
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 188

Context: # CHAPTER 7. NUMBERS

10001

1. (Step 2) Divide 3 by 2 and take the floor, which gives \(\lfloor 3/2 \rfloor = 1\). This still isn't zero, so make 1 our new value, move our pencil to the left of the 0, and go back to step 1.

2. (Step 1) Compute 1 mod 2. Since 1 is odd, this is one, which we write down:

   110001

3. (Step 2) Divide 1 by 2 and take the floor, which gives \(\lfloor 1/2 \rfloor = 0\). We're done. The final answer is 110001₂. Double-checking our work, we verify that indeed 32 plus 16 plus 1 gives 49, which is what we started with.

## Converting to and from hex

That was pretty tedious. But converting back and forth from binary to hex is a snap. That’s because 16 is \(2^4\), and so one hex digit is exactly equal to four binary digits. This isn’t the case with base 10, where one decimal digit is equal to three binary digits… plus a little extra. This “not quite a whole number of digits” thing is what makes converting from decimal to binary (or decimal to hex, for that matter) so awkward.

We most commonly deal with sets of eight bits at a time, which is called a byte. (This is the fundamental unit of storage on pretty much every computer on earth.) Suppose I had the following byte:

10000010₂

Because one hex digit is exactly equal to four bits, this byte is exactly equal to:

86₁₆
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 197

Context: 7.4. BINARY (BASE 2)
=====================

will contain one more digit than the original numbers did. You’ve probably seen this on a hand calculator when you press “=” and get an “E” (for “error”) in the display. If there are only ten digits on your display, adding two ten-digit numbers will (sometimes) result in an eleven-digit number that your calculator can’t display, and it’s alerting you to that fact so you don’t misinterpret the result. Here, we might add two 8-bit quantities and end up with a 9-bit quantity that can’t fit in one byte. This situation is called overflow, and we need to detect when it occurs.

The rules for detecting overflow are different depending on the scheme. For unsigned numbers, the rule is simple: if a 1 is carried out from the MSB (far left-side), then we have overflow. So if I were to try to add \( 155_{10} \) and \( 108_{10} \):

```
   1111
+ 10011011  ←  155_{10}
+ 01101100  ←  108_{10}
-----------
  100000111
```

then I get a carry out left into the 9th digit. Since we can only hold eight digits in our result, we would get a nonsensical answer (15_{10}), which we can detect as bogus because the carry out indicated overflow.

Sign-magnitude works the same way, except that I have one fewer bit when I’m adding and storing results. (Instead of a byte’s worth of bits representing magnitude, the left-end bit has been reserved for a special purpose: indicating the number’s sign.) Therefore, if I add the remaining 7-bit quantities and get a carry out left into the eighth digit, that would indicate overflow.

Now with two’s-complement, things are (predictably) not that easy. But in this case, they’re almost as easy. There’s still a simple rule to detect overflow; it’s just a different rule. The rule is: if the carry in to the last (left-most) bit is different than the carry out from the last bit, then we have overflow.

Let’s try adding \( 103_{10} \) and \( 95_{10} \) in two’s-complement, two numbers:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 201

Context: # 7.5 EXERCISES

6. If I told you that 98,243,917,215 mod 7 was equal to 1, would you call me a liar without even having to think too hard?  
   No, you shouldn’t. It turns out that the answer is 3, not 1, but how would you know that without working hard for it?  

7. If I told you that 273,111,999,214 mod 6 was equal to 6, would you call me a liar without even having to think too hard?  
   Yes, you should. Any number mod 6 will be in the range 0 through 5, never 6 or above. (Think in terms of repeatedly taking out groups of six from the big number. The mod is the number of stones you have left when there are no more whole groups of six to take. If towards the end of this process there are six stones left, that’s not a remainder because you can get another whole group!)

8. Are the numbers 18 and 25 equal?  
   Of course not. Don’t waste my time.

9. Are the numbers 18 and 25 congruent mod 7?  
   Yes. If we take groups of 7 out of 18 stones, we’ll get two such groups (a total of 14 stones) and have 4 left over. And then, if we do that same with 25 stones, we’ll get three such groups (a total of 21 stones) and again have 4 left over. So they’re not congruent mod 7.

10. Are the numbers 18 and 25 congruent mod 6?  
    No. If we take groups of 6 out of 18 stones, we’ll get three such groups with nothing left over. But if we start with 25 stones, we’ll take out 4 such groups (for a total of 24 stones) and have one left over. So they’re not congruent mod 6.

11. Are the numbers 617,418 and 617,424 equal?  
    Of course not. Don’t waste my time.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 205

Context: # Chapter 8
## Logic

To a great extent, logic governs the way your mind works, even among so-called “irrational people.” If we want to capture logical processes and represent them in a computer program, we need a way to express these thoughts in a form suitable for automated reasoning. This is primarily why computer scientists study logic.

Interestingly, the material in this chapter covers the very bottom and the very top of the technology stack. At the bottom, we have actual physical hardware that consists of circuits turning bits on and off. The rules that govern when we want to turn which bits on and off are based on “logic gates,” or tiny physical devices that implement the logical principles of this chapter on a micro scale. At the other end of the spectrum, we have highly abstract programs aiming towards “artificial intelligence.” These systems are centered around a “knowledge base” of accumulated facts, and regularly examine those known facts to make decisions and draw additional conclusions. What does a knowledge base consist of? You guessed it: logical statements that are described in this chapter.

### 8.1 Propositional logic

The simpler — but less powerful — of the two logic systems we’ll study is called **propositional logic**. It has this name because the core building block is the **proposition**. A proposition is simply a
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 207

Context: # 8.1. PROPOSITIONAL LOGIC

- \( \land \) — "and"
- \( \neg \) — "not"
- \( \lor \) — "or"
- \( \Rightarrow \) — "implies" (or "if...then...")
- \( \oplus \) — "xor" (exclusive "or")
- \( \equiv \) — "equiv" (equivalent)

Just as the ordinary arithmetic operators (\(+\), \(-\), etc.) can be used to join numbers and produce another number, and just as the set operators can be used to join sets and produce another set, the logical operators can be used to join propositions and produce another proposition. 

The expression \( 34 + 59 \) produces the number 93. The expression \( \{X, Y\} \cup \{Z\} \) produces the set \( \{X, Y, Z\} \). And the expression \( A \land B \) produces the value **false**, since although \( UMW \) is located in Virginia, the King is not female.

Let's run through the six operators, some of which are intuitive and some of which are not:

### \( \land \) ("and")
The proposition \( X \land Y \) is true when both \( X \) and \( Y \) are true propositions. "AAC" represents the proposition "UMW is in Virginia and dogs are carnivores," which has a truth value of **true** since both components are true. This operation is sometimes called a **conjunction**. Notice that the "∧" sign somewhat resembles the "∩" sign for set intersection. This is not an accident. An element is in the intersection of two sets if it is a member of the first and the second set. Hence mathematicians have chosen symbols which reinforce this connection.

### \( \lor \) ("or")
The proposition \( X \lor Y \) is true when either \( X \) or \( Y \) (or both) are true propositions. "BVC" represents the proposition "The King of England is female or dogs are carnivores," which has a truth value of **true** since the second component is true. This operation is sometimes called a **disjunction**. The \( \lor \) looks somewhat like the "U" sign for union, since an element is in the union of two sets if it is an element of the first set or the second set (or both). This operator is sometimes called an **inclusive or** since it is true if both propositions are true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 216

Context: # CHAPTER 8. LOGIC

| X | Y | X ∧ Y | ¬(X ∧ Y) | ¬X | ¬Y | X ∨ Y | ¬(X ∨ Y) | (X ∨ Y) ↔ ¬(X ∧ Y) |
|---|---|-------|-----------|----|----|-------|-----------|-----------------------|
| 0 | 0 |  0    |     1     |  1 |  1 |  0    |     1     |          1            |
| 0 | 1 |  0    |     1     |  1 |  0 |  1    |     0     |          1            |
| 1 | 0 |  0    |     1     |  0 |  1 |  1    |     0     |          1            |
| 1 | 1 |  1    |     0     |  0 |  0 |  1    |     0     |          1            |

This is one of De Morgan's Laws, which we've seen previously with regards to sets (p. 21). Here is the other:

| X | Y | X ∧ Y | ¬(X ∧ Y) | ¬X | ¬Y | X ∨ Y | ¬(X ∨ Y) | (X ∨ Y) ↔ ¬(X ∧ Y) |
|---|---|-------|-----------|----|----|-------|-----------|-----------------------|
| 0 | 0 |  0    |     1     |  1 |  1 |  0    |     1     |          1            |
| 0 | 1 |  0    |     1     |  1 |  0 |  1    |     0     |          1            |
| 1 | 0 |  0    |     1     |  0 |  1 |  1    |     0     |          1            |
| 1 | 1 |  1    |     0     |  0 |  0 |  1    |     0     |          1            |

The first can be expressed as “the negation of the disjunction is equal to the conjunction of the negations,” and the second as “the negation of the conjunction is equal to the disjunction of the negations.” If that helps at all.

One last identity is this one:

| X | Y | Z | X ∧ Y ∨ Z | X ∧ (Y ∨ Z) | X ∧ Z | (X ∧ Y) ∨ (X ∧ Z) | A* |
|---|---|---|-----------|-------------|-------|--------------------|----|
| 0 | 0 | 0 |     0     |      0      |   0   |         0          |  1 |
| 0 | 0 | 1 |     1     |      0      |   0   |         0          |  1 |
| 0 | 1 | 0 |     0     |      0      |   0   |         0          |  1 |
| 0 | 1 | 1 |     1     |      0      |   0   |         0          |  1 |
| 1 | 0 | 0 |     1     |      0      |   0   |         0          |  1 |
| 1 | 0 | 1 |     1     |      1      |   1   |         1          |  1 |
| 1 | 1 | 0 |     1     |      1      |   0   |         1          |  1 |
| 1 | 1 | 1 |     1     |      1      |   1   |         1          |  1 |

*Here, "A" is X ∧ (Y ∨ Z) ⇔ (X ∧ Y) ∨ (X ∧ Z).

This is none other than the distributive law, which we also saw for set union and intersection (p. 20) and which you should also remember from introductory algebra: \( x \cdot (y + z) = x \cdot y + x \cdot z \).

It's interesting, actually, when you compare the distributive law from algebra to the distributive law for logic:

\[ x \land (y \lor z) \equiv (x \land y) \lor (x \land z) \]

\( X \land (Y \lor Z) \equiv (X \land Y) \lor (X \land Z) \)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 217

Context: # 8.2 Predicate logic

The `∧` operator is analogous to `+` (times), while `∨` corresponds to `+` (plus). In fact, if you look at the truth tables for these two operators again, you’ll see an uncanny resemblance:

| X | Y | X ∧ Y | X ∨ Y |
|---|---|-------|-------|
| 0 | 0 |   0   |   0   |
| 0 | 1 |   0   |   1   |
| 1 | 0 |   0   |   1   |
| 1 | 1 |   1   |   1   |

Except for the `(1)` that I put in parentheses, this truth table is exactly what you’d get if you mathematically multiplied (`∧`) and added (`∨`) the inputs! At some level, logically “and-ing” is multiplying, while “or-ing” is adding. Fascinating.

## 8.2 Predicate logic

Propositional logic can represent a lot of things, but it turns out to be too limiting to be practically useful. That has to do with the atomic nature of propositions. Every proposition is its own opaque chunk of truthhood or falsity, with no way to break it down into constituent parts. 

Suppose I wanted to claim that every state in the union had a governor. To state this in propositional logic, I’d have to create a brand new proposition for each state:

- Let G1 be the proposition that Alabama has a governor.
- Let G2 be the proposition that Alaska has a governor.
- Let G3 be the proposition that Arizona has a governor.
- ...

and then, finally, I could assert:

G1 ∧ G2 ∧ G3 ∧ ... ∧ G50.

That’s a lot of work just to create a whole bunch of individual propositions that are essentially the same. What we need is some
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 219

Context: # 8.2. Predicate Logic

which is perfectly true[^1].

You may recall the word "predicate" from your middle school grammar class. Every sentence, remember, has a subject and a predicate. In “Billy jumps,” “Billy” is the subject, and “jumps” the predicate. In “The lonely boy ate spaghetti with gusto,” we have “the lonely boy” as the subject and “ate spaghetti with gusto” as the predicate. Basically, a predicate is anything that can describe or affirm something about a subject. Imagine asserting “JUMPS(Billy)” and “ATESPAGHETTIWITHGUSTO(lonely boy).”

A predicate can have more than one input. Suppose we define the predicate `IsFanOf` as follows:

Let `IsFanOf(x, y)` be the proposition that x digs the music of rock band y.

Then I can assert:

- `IsFanOf(Stephen, Led Zeppelin)`
- `IsFanOf(Rachel, The Beatles)`
- `IsFanOf(Stephen, The Beatles)`
- `¬IsFanOf(Stephen, The Rolling Stones)`

We could even define `TraveledToByModeInYear` with a bunch of inputs:

Let `TraveledToByModeInYear(p, d, m, y)` be the proposition that person p traveled to destination d by mode m in year y.

The following statements are then true:

- `TraveledToByModeInYear(Stephen, Richmond, car, 2017)`

> "By the way, when I say you can give any input at all to a predicate, I mean any individual element from the domain of discourse. I don’t mean that a set of elements can be an input. This limitation is why it’s called ‘first-order’ predicate logic. If you allow sets to be inputs to predicates, it’s called ‘second-order predicate logic’, and can get quite messy." 

[^1]: The footnote text if needed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 222

Context: 214  
# CHAPTER 8. LOGIC

So these two cases both result in true. But perhaps surprisingly, we also get true for oatmeal:

- **Human**(oatmeal) → **Adult**(oatmeal) ⊕ **Child**(oatmeal)  
  false → false ⊕ false  
  false → false  
  true ✓  

Whoa, how did true pop out of that? Simply because the premise was false, and so all bets were off. We effectively said “if a bowl of oatmeal is human, then it will either be an adult or a child. But it’s not, so never mind!” Put another way, the bowl of oatmeal did not turn out to be a counterexample, and so we’re confident claiming that this expression is true “for all \( x \): \( \forall \)”.

The other kind of quantifier is called the **existential quantifier**. As its name suggests, it asserts the existence of something. We write it \( \exists \) and pronounce it “there exists.” For example,

- \( \exists \, \text{HasGovernor}(x) \) asserts that there is at least one state that has a governor. This doesn’t tell us how many states this is true for, and in fact despite their name, quantifiers really aren’t very good at “quantifying” things for us, at least numerically. As of 2008, the statement

- \( \text{President}(x) \land \text{African-American}(x) \)

is true, and always will be, no matter how many more African-American U.S. presidents we have. Note that in compound expressions like this, a variable (like \( x \)) always stands for a single entity wherever it appears. For hundreds of years there existed African-Americans, and there have existed Presidents, so the expression above would be ridiculously obvious if it meant only “there have been Presidents, and there have been African-Americans.” But the same variable \( x \) being used as inputs to both predicates is what seals the deal and makes it represent the much stronger statement: “there is at least one individual who is personally both African-American and President of the United States at the same time.”
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 224

Context: # CHAPTER 8. LOGIC

## Order matters

When you're facing an intimidating morass of \( \forall \)s and \( \exists \)s and \( \vee \)s and \( \wedge \)s, and God knows what else, it's easy to get lost in the sauce. But you have to be very careful to dissect the expression to find out what it means. Consider this one:

\[
\forall x \in \mathbb{R}, \exists y \in \mathbb{R} \quad y = x + 1.
\] (8.5)

This statement is true. It says that for every single real number (call it \( x \)), it's true that you can find some other number (call it \( y \)) that's one greater than it. If you generate some examples it's easy to see this is true. Suppose we have the real number \( x = 5 \). Is there some other number \( y \) that's equal to \( 2 + x \)? Of course, the number 6. What if \( x = -32.4 \)? Is there a number \( y \) that satisfies this equation? Of course, \( y = -31.4 \). Obviously not matter what number \( x \) we choose, we can find the desired number \( y \) just by adding one. Hence this statement is true for all \( x \), just like it says.

What happens, though, if we innocently switch the order of the quantifiers? Let's try asserting this:

\[
\exists y \in \mathbb{R}, \forall x \in \mathbb{R} \quad y = x + 1.
\] (8.6)

Is this also true? Look carefully. It says “there exists some magic number \( y \) that has the following amazing property: no matter what value of \( x \) you choose, this \( y \) is one greater than \( x \)!” Obviously this is not true. There is no such number \( y \). If I choose \( y = 13 \), that works great as long as I choose \( x = 12 \), but for any other choice of \( x \), it’s dead in the water.

## The value of precision

This fluency with the basic syntax and meaning of predicate logic was our only goal in this chapter. There are all kinds of logical rules that can be applied to predicate logic statements in order to
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 226

Context: # Chapter 8: Logic

Here's a funny one I’ll end with. Consider the sentence **“He made her duck.”** What is intended here? Did some guy reach out with his hand and forcefully push a woman’s head down out of the way of a screaming projectile? Or did he prepare a succulent dish of roasted fowl to celebrate her birthday? Oh, if the computer could only know. If we’d used predicate logic instead of English, it could!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 231

Context: # Chapter 9

## Proof

We've seen a lot of pretty sights on our cool brisk walk. We've caught a glimpse of the simple elegance of sets and relations, the precision of probabilistic reasoning, the recursive structure of trees, the explosive nature of combinatorics, and much more. None of these things have we plumbed to the depths, but we've appreciated their beauty and taken note of where they stood along our blazed trail. You’ll remember this hike when you run into such concepts again and again in future computer science and math courses, and in your career beyond academics.

Now we have one more step to make before returning to the trailhead, and that deals with the notion of **proof**. As we've studied these various mathematical entities, I’ve pointed out certain of their properties. A free tree has one more vertex than edge, for example. The cardinality of the union of two sets is at least as big as each of their individual unions. If you flip-all-the-bits-and-add-one in a two's complement scheme, and then perform that flip-and-add operation again, you’ll return to the original number. But with a few exceptions, we haven't proven any of these things. I’ve just stated them, and you’ve taken them on faith.

In order to establish reliable truth, of course, professional mathematicians aren’t satisfied with unsubstantiated statements. They need to be convinced that the claims we make do truly hold, and provably so, in all circumstances. What they seek is a **proof** of a...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 233

Context: # 9.1. PROOF CONCEPTS

The phone is in my pocket, and has not rung, and I conclude that the plan has not changed. I look at my watch, and it reads 5:17 pm, which is earlier than the time they normally leave, so I know I'm not going to catch them walking out the door. This is, roughly speaking, my thought process that justifies the conclusion that the house will be empty when I pull into the garage.

Notice, however, that this prediction depends precariously on several facts. What if I spaced out the day of the week, and this is actually Thursday? All bets are off. What if my cell phone battery has run out of charge? Then perhaps they did try to call me but couldn’t reach me. What if I set my watch wrong and it’s actually 4:17 pm? Etc. Just like a chain is only as strong as its weakest link, a whole proof falls apart if even one step isn’t reliable.

Knowledge bases in artificial intelligence systems are designed to support these chains of reasoning. They contain statements expressed in formal logic that can be examined to deduce only the new facts that logically follow from the old. Suppose, for instance, that we had a knowledge base that currently contained the following facts:

1. \( A = C \)
2. \( \neg (C \land D) \)
3. \( (F \lor E) \to D \)
4. \( A \lor B \)

These facts are stated in propositional logic, and we have no idea what any of the propositions really mean, but then neither does the computer, so hey. Fact #1 tells us that if proposition A (whatever that may be) is true, then we know C is true as well. Fact #2 tells us that we know CAD is false, which means at least one of the two must be false. And so on. Large knowledge bases can contain thousands or even millions of such expressions. It’s a complete record of everything the system “knows.”

Now suppose we learn an additional fact: \( \neg B \). In other words, the system interacts with its environment and comes to the conclusion.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 235

Context: ```
9.2 TYPES OF PROOF
==================

to be true, and so it is legal grounds from which to start. A proof can't even get off the ground without axioms. For instance, in step 1 of the above proof, we noted that either A or B must be true, and so if B isn't true, then A must be. But we couldn't have taken this step without knowing that disjunctive syllogism is a valid form of reasoning. It's not important to know all the technical names of the rules that I included in parentheses. But it is important to see that we made use of an axiom of reasoning on every step, and that if any of those axioms were incorrect, it could lead to a faulty conclusion.

When you create a valid proof, the result is a new bit of knowledge called a theorem which can be used in future proofs. Think of a theorem like a subroutine in programming: a separate bit of code that does a job and can be invoked at will in the course of doing other things. One theorem we learned in chapter 2 was the distributive property of sets; that is, \( X \cap (Y \cup Z) = (X \cap Y) \cup (X \cap Z) \). This can be proven through the use of Venn diagrams, but once you've proven it, it's accepted to be true, and can be used as a "given" in future proofs.

## 9.2 Types of Proof

There are a number of accepted "styles" of doing proofs. Here are some important ones:

### Direct Proof

The examples we've used up to now have been direct proofs. This is where you start from what's known and proceed directly by positive steps towards your conclusion.

Direct proofs remind me of a game called "word ladders," invented by Lewis Carroll, that you might have played as a child:

```
WARM
  |
  |
? ? ? ?
  |
  |
COLD
```
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 238

Context: 230  
# CHAPTER 9. PROOF  

An irrational number is one that **cannot** be expressed as the ratio of two integers, no matter what the integers are.  

Proving this directly seems pretty hard, since how do you prove that there aren't any two integers whose ratio is \(\sqrt{2}\), no matter how hard you looked? I mean, \(534,927\) and \(378,250\) are pretty dang close:  

\[
\left( \frac{534927}{378250} \right)^2 = 2.000005.
\]

How could we possibly prove that no matter how hard we look, we can never find a pair that will give it to us exactly?  

One way is to assume that \(\sqrt{2}\) is a rational number, and then prove that down that path lies madness. It goes like this: Suppose \(\sqrt{2}\) is rational, after all. That means that there must be two integers, call them \(a\) and \(b\), whose ratio is exactly equal to \(\sqrt{2}\):  

\[
\frac{a}{b} = \sqrt{2}.
\]

This, then, is the starting point for our indirect proof. We're going to proceed under this assumption and see where it leads us.  

By the way, it's clear that we could always reduce this fraction to lowest terms in case it's not already. For instance, if \(a = 6\) and \(b = 4\), then our fraction would be \(\frac{3}{2}\), which is the same as \(\frac{6}{4}\), so we could just say \(a = 3\) and \(b = 2\) and start over. Bottom line: if \(\sqrt{2}\) is rational, then we can find two integers \(a\) and \(b\) that have no common factor (if they do have a common factor, we'll just divide it out of both of them and go with the new numbers) whose ratio is \(\sqrt{2}\).  

Okay then. But now look what happens. Suppose we square both:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 240

Context: 232  
# CHAPTER 9. PROOF  

“recursion,” because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.116). A tree can be thought of as a node with several children—each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.16 on p.113, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs.

Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron.

## Casting the problem in the right form  

Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.)

Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let VOTE(n) be the proposition that a citizen of age n can vote.”

If I want to prove an algebraic identity, like  

\[
\sum_{i=1}^{n} \frac{i}{n+1} = \frac{n(n+1)}{2}, 
\]  

then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \( n \) variable in my equation.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 244

Context: # Example 2

A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher's astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \( 50 \times 101 = 5050 \). In general, if you add the numbers from 1 to \( x \), where \( x \) is any integer at all, you'll get \( \frac{x(x + 1)}{2} \).

Now, use mathematical induction to prove that Gauss was right (i.e., that \( \sum_{i=1}^{x} i = \frac{x(x + 1)}{2} \)) for all numbers \( x \).

First, we have to cast our problem as a predicate about natural numbers. This is easy: say “let \( P(n) \) be the proposition that 

\[
\sum_{i=1}^{n} i = \frac{n(n + 1)}{2} 
\]

Then, we satisfy the requirements of induction:

1. **Base Case**: We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have:

   \[
   \sum_{i=1}^{1} i = \frac{1(1 + 1)}{2}
   \]
   \[
   1 = \frac{1(2)}{2}
   \]
   \[
   1 = 1 \quad \text{✓}
   \]

2. **Inductive Step**: We now must prove that \( P(k) \) implies \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 250

Context: # CHAPTER 9. PROOF

It can obviously be written as just itself, which is the product of one prime. (23 can be written as "23".) But suppose it's not. Then, it can be broken down as the product of two numbers, each less than itself. (21 can be broken down as 7 * 3; 24 can be broken down as 6 * 4 or 12 * 2 or 8 * 3, take your pick.) Now we know nothing special about those two numbers... except the fact that the inductive hypothesis tells us that all numbers less than \( k + 1 \) are expressible as the product of one or more primes! So these two numbers, whatever they may be, are expressible as the product of primes, and so when you multiply them together to get \( k + 1 \), you will have a longer string of primes multiplied together. Therefore, \((\forall k < P(k)) \Rightarrow P(k + 1)\).

## Conclusion

Therefore, by the strong form of mathematical induction, \(\forall n \geq 2 \, P(n)\).

You can see why we needed the strong form here. If we wanted to prove that 15 is expressible as the product of primes, knowing that 14 is expressible as the product of primes doesn't do us a lick of good. What we needed to know was that 5 and 3 were expressible in that way. In general, the strong form of induction is useful when you have to break something into smaller parts, but there's no guarantee that the parts will be “one less” than the original. You only know that they'll be smaller than the original. A similar example follows.

## Example 2

Earlier (p. 111) we stated that every tree has one less edge than node. Prove it.

Let \( P(n) \) be the proposition that a free tree with \( n \) nodes has \( n - 1 \) edges.

1. **Base case.** \( P(1) \) is true, since a free tree with 1 node is just a single lonely node, and has no edges.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 252

Context: # 9.4 Final Word

Finding proofs is an art. In some ways, it’s like programming: you have a set of building blocks, each one defined very precisely, and your goal is to figure out how to assemble those blocks into a structure that starts with only axioms and ends with your conclusion. It takes skill, patience, practice, and sometimes a little bit of luck.

Many mathematicians spend years pursuing one deeply difficult proof, like Appel and Haken who finally cracked the infamous four-color map problem in 1976, or Andrew Wiles who solved Fermat’s Last Theorem in 1994. Some famous mathematical properties may never have proofs, such as Christian Goldbach’s 1742 conjecture that every even integer is the sum of two primes; or the most elusive and important question in computing theory: does P=NP? (Put very simply: if you consider the class of problems where it’s easy to verify a solution once you have it, but crazy hard to find it in the first place, is there actually an easy algorithm for finding the solution that we just haven’t figured out yet?) Most computer scientists think “no,” but despite a mind-boggling number of hours invested by the brightest minds in the world, no one has ever been able to prove it one way or the other.

Most practicing computer scientists spend time taking advantage of the known results about mathematical objects and structures, and rarely (if ever) have to construct a watertight proof about them. For the more theoretically-minded student, however, who enjoys probing the basis behind the tools and speculating about additional properties that might exist, devising proofs is an essential skill that can also be very rewarding.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 253

Context: # Index

- n-choose-k notation, 156
- n-to-the-k-falling operator, 152
- a priori, 68
- modus ponens, 201, 226
- modus tollens, 226
- quad erat demonstrandum (Q.E.D.), 226
- reductio ad absurdum, 229
- acyclic (graphs), 91
- additivity property, 63
- adjacent (vertices), 89
- algorithm, 97, 127, 128, 132, 149, 150
- Ali, Muhammad, 92
- American Idol, 62, 68
- ancestor (of a node), 115
- and (logical operator), 18, 199, 203
- antisymmetric (relation), 40, 43
- Appel, Kenneth, 244
- arrays, 13
- artificial intelligence (AI), 197, 201, 225
- associative, 20
- asymmetric (relation), 41
- ATM machines, 143
- atomic (propositions), 198
- AVL trees, 133
- axioms, 226, 229
- background knowledge, 68, 70
- balancedness (of a tree), 132
- base case (of a proof), 233, 240
- bases (of number systems), 166, 168, 170
- Bayes' Theorem, 75
- Bayes, Thomas, 67
- Bayesian, 66
- BFT (breadth-first traversal), 95, 97
- Big-O notation, 127
- bijective (function), 49
- binary numbers, 25, 177, 178, 180, 182
- binary search trees, 123, 125
- binomial coefficients, 156
- bit, 177
- Booth, John Wilkes, 86
- BST property, 125, 131
- byte, 180
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 254

Context: ```
# CHAPTER 9. PROOF

- Cantor, Georg, 7, 12, 17
- capacity (of a byte), 182
- cardinality (of sets), 16, 25, 28, 66
- Carroll, Lewis, 227
- carry-in, 189
- carry-out, 189
- Cartesian product (of sets), 
  - 19, 35
- chess, 114
- child (of a node), 115
- closed interval, 61
- codomain (of a function), 45
- collectively exhaustive, 26
- combinations, 154
- combinators, 25, 141
- commutative, 18, 20, 71
- compilers, 114
- complement laws (of sets), 
  - 21
- complement, partial (of sets), 
  - 18
- complement, total (of sets), 
  - 18, 65, 146, 162
- complete binary tree, 121
- conclusion (of implication), 200
- conditional probability, 
  - 68, 72, 74, 78
- congruent, 173
- conjunction, 199, 208
- connected (vertices/graphs), 
  - 89, 95
- coordinates, 15
- curly brace notation, 11
- cycles, 90

## DAGs (directed acyclic graphs), 90

- data structures, 85
- Davies family, 8, 9, 26, 147, 154
- De Morgan's laws, 21, 22, 207, 208
- decimal numbers, 165, 169, 173, 178
- degree (of a vertex), 90
- depth (of a node), 115
- dequeuing, 95
- descendant (of a node), 115
- DFT (depth-first traversal), 99, 101

## Dijkstra's algorithm, 101, 104
- Dijkstra, Edsger, 101
- direct proof, 227
- directed graphs, 88, 91
- disjunction, 199, 208, 226
- disjunctive syllogism, 226
- disk sectors, 156
- distributive, 20, 208, 227
- domain (of a function), 45
- domain of discourse (?), 9, 
  - 19, 21, 27, 60, 210
- domination laws (of sets), 21
- dominors, 234
- drinking age, 232, 234
- duplicates (in sets), 13
- edges, 86, 87
- elements (of sets), 8, 15, 23
- ellipses, 12
- empty graph, 87
- empty set, 9, 16, 21, 24, 25, 
  - 36, 114
- endorelations, 38, 93
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 257

Context: 9.4. FINAL WORD
=================

- ordered triples, 15
- org charts, 113
- outcomes, 60, 62
- overflow, 188

P = NP?, 244  
parent (of a node), 114  
partial orders, 43  
partial permutations, 151, 154  
partitions, 26, 71, 94  
Pascal's triangle, 157  
passwords, 146  
paths (in a graph), 88, 113  
perfect binary tree, 122, 239  
permutations, 147  
PINs, 143  
poker, 160  
pop (off a stack), 99  
posts, 43  
post-order traversal, 118  
postulates, 226  
power sets, 24, 36  
pre-order traversal, 117  
predicates, 210, 211, 232  
predicate logic, 210  
premise (of implication), 200  
Prim's algorithm, 107  
Prime, Robert, 107  
prior probability, 68  
probability measures, 61, 63, 65  
product operator (II), 142, 160  
proof, 223  
proof by contradiction, 229  
propositional logic, 197, 225  
propositions, 197, 210  

- psychology, 70, 86
- push (on a stack), 99
- quantifiers, 212, 215
- queue, 95, 97
- quotient, 173, 174

- range (of a function), 48
- rational numbers (ℝ), 17, 24
- reachable, 89
- real numbers (ℝ), 71, 94
- rebalancing (a tree), 132
- recursion, 116, 120, 149, 231
- red-black trees, 133
- reflexive (relation), 40, 43
- relations, finite, 39
- relations, infinite, 39
- remainder, 173, 174
- right child, 116  
- root (of a tree), 112, 114  
- rooted trees, 112, 134  
- Russell's paradox, 15  

- sample space (Ω), 60  
- semantic network, 87  
- set operators, 18  
- set-builder notation, 11  
- sets, 8, 93  
- sets of sets, 15  
- sets, finite, 12  
- sets, fuzzy, 10  
- sets, infinite, 12  
- sibling (of a node), 115  
- signs-magnitude binary numbers, 183, 189  

**Sonic the Hedgehog**, 73  
- southern states, 72  
- spatial positioning, 92, 113
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 6

Context: # A Brief Introduction to Machine Learning for Engineers

Osvaldo Simeone¹  
¹Department of Informatics, King’s College London; osvaldo.simeone@kcl.ac.uk

## ABSTRACT

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. The material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with an engineering background in probability and linear algebra.

---

**ISSN:** DOI XXXXXXXXXX  
©2018 XXXXXXXXXX
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 13

Context: # 1.1. What is Machine Learning?

This starts with an in-depth analysis of the problem domain, which culminates with the definition of a mathematical model. The mathematical model is meant to capture the key features of the problem under study and is typically the result of the work of a number of experts. The mathematical model is finally leveraged to derive hand-crafted solutions to the problem.

For instance, consider the problem of defining a chemical process to produce a given molecule. The conventional flow requires chemists to leverage their knowledge of models that predict the outcome of individual chemical reactions, in order to craft a sequence of suitable steps that synthesize the desired molecule. Another example is the design of speech translation or image/video compression algorithms. Both of these tasks involve the definition of models and algorithms by teams of experts, such as linguists, psychologists, and signal processing practitioners, not infrequently during the course of long standardization meetings.

The engineering design flow outlined above may be too costly and inefficient for problems in which faster or less expensive solutions are desirable. The machine learning alternative is to collect large data sets, e.g., of labeled speech, images, or videos, and to use this information to train general-purpose learning machines to carry out the desired task. While the standard engineering flow relies on domain knowledge and on design optimized for the problem at hand, machine learning lets large amounts of data dictate algorithms and solutions. To this end, rather than requiring a precise model of the set-up under study, machine learning requires the specification of an objective, of a model to be trained, and of an optimization technique.

Returning to the first example above, a machine learning approach would proceed by training a general-purpose machine to predict the outcome of known chemical reactions based on a large data set, and then by using the trained algorithm to explore ways to produce more complex molecules. In a similar manner, large data sets of images or videos would be used to train a general-purpose algorithm with the aim of obtaining compressed representations from which the original input can be recovered with some distortion.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 17

Context: # 1.3 Goals and Outline

This monograph considers only passive and offline learning.

## 1.3 Goals and Outline

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. Unlike other texts that are focused on one particular aspect of the field, an effort has been made here to provide a broad but concise overview in which the main ideas and techniques are systematically presented. Specifically, the material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. A prior exposure to information theory is useful but not required.

Detailed discussions are provided on basic concepts and ideas, including overfitting and generalization, Maximum Likelihood and regularization, and Bayesian inference. The text also endeavors to provide intuitive explanations and pointers to advanced topics and research directions. Sections and subsections containing more advanced material that may be regarded at a first reading are marked with a star (+).

The reader will find here further discussions on computing platform or programming frameworks, such as map-reduce, nor details on specific applications involving large data sets. These can be easily found in a vast and growing body of work. Furthermore, rather than providing exhaustive details on the existing myriad solutions in each specific category, techniques have been selected that are useful to illustrate the most salient aspects. Historical notes have also been provided only for a few selected milestone events.

Finally, the monograph attempts to strike a balance between the algorithmic and theoretical viewpoints. In particular, all learning algorithms are presented in a manner that emphasizes their theoretical foundations while also providing practical insights.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 20

Context: # Introduction

from *The Computer and the Brain*, 1958.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 49

Context: # 2.5. Minimum Description Length (MDL)

In a gray scale image, fix some probability mass function (pmf) \( p(x) \) on this alphabet. A key result in information theory states that it is possible to design a lossless compression scheme that uses \( [-\log p(x)] \) bits to represent value \( x \).

By virtue of this result, the choice of a probability distribution \( p(x) \) is akin to the selection of a lossless compression scheme that produces a description of around \( -\log p(x) \) bits to represent value \( x \). Note that the description length \( -\log p(x) \) decreases with the probability assigned by \( p(x) \) to value \( x \): more likely values under \( p(x) \) are assigned a smaller description. Importantly, a decoder would need to know \( p(x) \) in order to recover \( x \) from the bit description.

At an informal level, the MDL criterion prescribes the selection of a model that compresses the training data to the shortest possible description. In other words, the model selected by MDL defines a compression scheme that describes the data set \( D \) with the minimum number of bits. As such, the MDL principle can be thought of as a formalization of Occam’s razor: choose the model that yields the simplest explanation of the data. As we will see below, this criterion naturally leads to a solution that penalizes overfitting.

### What is the length of the description of a data set \( D \) that results from the selection of a specific value of \( M \)?

The answer is not straightforward, since, for a given value of \( M \), there are as many probability distributions as there are values for the corresponding parameters \( \theta \) to choose from. A formal calculation of the description length would hence require the introduction of the concept of universal compression for a given probabilistic model \( \theta \). Here, we will limit ourselves to a particular class of universal codes known as two-part codes.

Using two-part codes, we can compute the description length for the data \( D \) that results from the choice of a model over \( M \) as follows. First, we obtain the ML solution \( (m_{ML}, \hat{\theta}_{ML}) \). Then, we describe the data set by using a compression scheme defined by the probability \( p(x | m_{ML}, \hat{\theta}_{ML}) \). As discussed, this pro-
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 56

Context: 50  
# A Gentle Introduction through Linear Regression

When the number of data points goes to infinity; while the latter is parameterized and is subject to optimization. For this reason, divergence metrics between the two distributions play an important role in the development of learning algorithms. We will see in the rest of the monograph that other frequentist methods may involve a single distribution rather than two, as discussed in Sec. 6.6 or an additional, so-called variational distribution, as covered in Sec. 6.3 and Chapter 8.

In contrast, Bayesian methods posit a single coherent distribution over the data and the parameters, and frame the learning problem as one of inference of unobserved variables. As we will discuss in Chapter 8, variational Bayesian methods also introduce an additional variational distribution and are a building block for frequentist learning in the presence of unobserved variables.

The running example in this chapter has been one of linear regression for a Gaussian model. The next chapter provides the necessary tools to construct and learn more general probabilistic models.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 64

Context: ```markdown
setting \( C = 2 \) recovers the Bernoulli distribution. PMFs in this model are given as:

\[
\text{Cat}(x| \mu) = \prod_{k=0}^{C-1} \mu_k^{I(x=k)} \times (1 - \sum_{k=1}^{C-1} \mu_k) \quad (3.16)
\]

where we have defined \( \mu_k = \Pr[x = k] \) for \( k = 1, \ldots, C - 1 \) and \( \mu_0 = 1 - \sum_{k=1}^{C-1} \mu_k = \Pr[x = 0] \). The log-likelihood (LL) function is given as:

\[
\ln(\text{Cat}(x|\mu)) = \sum_{k=1}^{C-1} I(x=k) \ln \mu_k + \ln \mu_0 \quad (3.17)
\]

This demonstrates that the categorical model is in the exponential family, with sufficient statistics vector \( u(x) = [I(x = 0) \ldots I(x = C - 1)]^T \) and measure function \( m(x) = 1 \). Furthermore, the mean parameters \( \mu = [\mu_1, \ldots, \mu_{C-1}]^T \) are related to the natural parameter vector \( \eta = [\eta_1, \ldots, \eta_{C-1}]^T \) by the mapping:

\[
\eta_k = \ln \left( \frac{\mu_k}{1 - \sum_{j=1}^{C-1} \mu_j} \right) \quad (3.18)
\]

which again takes the form of an LLR. The inverse mapping is given by:

\[
\mu = \left( \frac{e^{\eta_k}}{1 + \sum_{j=1}^{C-1} e^{\eta_j}} \right) \quad (3.19)
\]

The parameterization given here is minimal, since the sufficient statistics \( u(x) \) are linearly independent. An overcomplete representation would instead include in the vector of sufficient statistics also the function \( I(x=0) \). In this case, the resulting vector of sufficient statistics is:

\[
u(x) = 
\begin{cases} 
1 & \text{if } x = 0 \\ 
1 & \text{if } x = C - 1 
\end{cases} \quad (3.20)
\]

which is known as one-hot encoding of the categorical variable, since only one entry equals 1 while the others are zero. Furthermore, with this...
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 86

Context: write \( p(C_k | x) \) for the discriminative probability when more convenient.

- **Generative probabilistic model**: Model the joint distribution of domain point and class label by specifying the prior distribution \( p(t) \) or \( p(C_k) \), and the class-dependent probability distribution \( p(x | C_k) \), or \( p(x | C_k) \), of the domain points within each class.

Discriminative models are arguably to be considered as setting the current state of the art on classification, including popular methods such as Support Vector Machine (SVM) and deep neural networks. Generative models are potentially more flexible and powerful as they allow to capture distinct class-dependent properties of the covariates \( x \).

### 4.3 Discriminative Deterministic Models

In this section, we discuss binary classification using discriminative deterministic models. Owing to their practical importance and to their intuitive geometric properties, we focus on linear models, whereby the binary prediction \( \hat{y}(x) \) is obtained by applying a threshold rule on a decision variable \( a(x, \mathbf{w}) \) obtained as a linear function of the learnable weights \( \mathbf{w} \) (the notation will be introduced below). Note that the decision variable \( a(x, \mathbf{w}) \) may not be a linear function of the covariates \( x \). As we will discuss, this class of models underlie important algorithms that are extensively used in practical applications such as SVM. A brief discussion on multi-class classification using deterministic models is provided at the end of this section. In the next two sections, we cover discriminative probabilistic models, including GLMs and more general models.

#### 4.3.1 Model

In their simplest form, linear discriminative deterministic classification models are of the form

\[
\hat{y}(x, \mathbf{w}) = \text{sign}(a(x, \mathbf{w})) \tag{4.5}
\]
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 120

Context: # 5.1 A Formal Framework for Supervised Learning

In this chapter, we concentrate on discriminative deterministic models for binary classification, as it is typically done in statistical learning theory. We also focus on the standard 0-1 loss \( \ell(t, \hat{t}) = \mathbb{I}\{t \neq \hat{t}\} \), for which the generalization loss is the probability of error. The labels \( t \) for the two classes take values in the set \{0, 1\} (cf. (4.4)).

The learning problem is formalized as follows. Assume that a model, or hypothesis class, \( \mathcal{H} \) has been selected. This set contains a possibly uncountable number of predictors \( i \) that map each point \( x \) in the domain space to a label \( \ell(x) \in \{0, 1\} \). We would like to choose a specific hypothesis, or predictor, \( \hat{f} \in \mathcal{H} \) that minimizes the generalization error (cf. (2.2)).

\[
L_{\mathcal{H}}(\hat{f}) = \mathbb{E}_{x \sim \mathcal{D}}[\ell(t, \hat{f}(x))]. \tag{5.1}
\]

Solving this inference problem would yield an optimal model within class \( \mathcal{H} \) as 

\[
\hat{f} \in \arg\min_{f \in \mathcal{H}} L(f). \tag{5.2}
\]

The notation in (5.2) emphasizes that there may be multiple optimal hypotheses returned by the minimization of the generalization error \( L(\hat{f}) \). Nevertheless, to fix the ideas, it is useful to think of the case in which there is a unique optimal hypothesis. This is, for instance, the case when the loss function is strictly convex. Obtaining the optimal predictor (5.2) requires knowledge of the true distribution \( p(x, t) \), which is not available.

## Example 5.1

For the linear (deterministic) methods studied in Chapter 4, the model is defined as

\[
\mathcal{H} = \{t_b(x) = \langle w, x \rangle + w_0\} \tag{5.3}
\]

with \( w \in \mathbb{R}^d \), and similarly for the feature-based version. Identifying a hypothesis within this class requires the selection of the weight vector \( w \in \mathbb{R}^d \).

In lieu of the true distribution \( p(x, t) \), what is available is an i.i.d. training set 

\[
\mathcal{D} = \{(x_n, t_n)\}_{n=1}^N \sim p(x, t). \tag{5.4}
\]
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 127

Context: 5.3. PAC Learnability for Finite Hypothesis Classes
========================================

As the \( \text{ERM} \) sample complexity for class \( \mathcal{H} \) depends on the capacity of the hypothesis class, defined as \( |\mathcal{H}| \) (bits) or \( \log |\mathcal{H}| \) (bits). This is the number of bits required to index the hypotheses in \( \mathcal{H} \). It is also interesting to note that increasing the accuracy, i.e., decreasing \( \delta \), is more demanding than increasing the confidence, that is, decreasing \( \delta \), in terms of sample complexity.

Another way to understand the result (5.11) is that, with \( N \) data points, we can achieve the estimation error

\[
\varepsilon = \sqrt{\frac{2 \ln(2/|\mathcal{H}|/\delta)}{N}} \tag{5.12}
\]

with probability \( 1 - \delta \), by using ERM. As a result, with \( N \) data points, we can upper bound the generalization loss of ERM as 

\[
L_{\mathcal{H}}(\text{ERM}) \leq L_{\mathcal{H}}(h^*) + \sqrt{\frac{2 |\mathcal{H}|}{N}} \tag{5.13}
\]

with probability \( 1 - \delta \). In words, ERM achieves the optimal generalization loss with an estimation error that scales with the square root of the model capacity and with the inverse square root of \( N \).

Another important note, under the realizability assumption, the theorem can be modified to yield the smaller upper bound [133]

\[
N_{\mathcal{H}}^*(\varepsilon, \delta) \leq \frac{|\mathcal{H}|}{\varepsilon} + \frac{\ln(1/\delta)}{\varepsilon} \leq N_{\text{ERM}}^{\mathcal{H}}(\varepsilon, \delta), \tag{5.14}
\]

which is also achievable by ERM.

What does the theorem show about infinite models such as the linear classifier (6.3)? One approach is to learn a “quantized” version of \( \mathcal{H} \), say \( \mathcal{H}_b \), in which each weight is represented by \( b \) bits or, equivalently, as one of \( 2^b \) pre-determined quantization levels. As a result, the number of hypotheses in the hypothesis class \( \mathcal{H} \) is \( |\mathcal{H}| = (D + 1)^{2} \), and the capacity of the hypothesis class is \( |H| = (D + 1) \ln(2) \) (bits) or \( |\mathcal{H}| = b(D + 1) \) (bits). It follows that, using (5.3), we obtain the ERM sample complexity

\[
N_{\text{ERM}}^{\mathcal{H}}(\varepsilon, \delta) \geq \frac{2b(D + 1) + 2 \ln(2/\delta)}{\varepsilon^2} \tag{5.15}
\]
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 176

Context: ```
to conditional independence assumptions. For instance, in the naive Bayes model, the word indicators are conditionally independent given the topic. As we will see in the rest of this chapter, conditional independence assumptions translate into factorizations of the joint distributions of the variables under study. Factorizations and associated conditional independence properties can be represented by three different graphical frameworks, namely BNs, MRFs, and factor graphs. For brevity, this chapter will only focus on the first two.

## 7.2 Bayesian Networks

This section provides a brief introduction to BNs by focusing on key definitions and on the problem of ML learning with some note on MAP and Bayesian viewpoints.

### 7.2.1 Definitions and Basics

BNs encode a probability factorization or, equivalently, a set of conditional independence relationships through a directed graph. The starting point is the chain rule for probabilities for a generic set of \( K \) random variables (rvs) \( \{x_1, ..., x_K\} \):

\[
p(x_1, ..., x_K) = p(x_1)p(x_2|x_1)...p(x_K|x_1, ..., x_{K-1}) \tag{7.3}
\]

where the order of the variables is arbitrary. The factorization (7.3) applies for a generic joint distribution, and it does not encode any additional structural information. Note that the notation here is general and not meant to indicate that the variables are necessary observed.

#### Example 7.3

Consider again the naive Bayes model for text classification/clustering. There, we imposed the structural constraint that word indicators \(\{x_w\}_{w=1}^W\) be conditionally independent given the topic \(t\). This conditional independence assumption can be expressed using the “perp” notation:

\[
x_w \perp \{x_{w'}: w' \neq w\}|t. \tag{7.4}
\]
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 213

Context: # Appendix A: Information Measures

In this appendix, we describe a principled and intuitive introduction to information measures that builds on inference, namely estimation and hypothesis testing. We focus on entropy, mutual information, and divergence measures. We also concentrate on discrete r.v.s. In the monograph, we have taken the pragmatic approach of extending the definitions to continuous variables by substituting sums with integrals. It is worth noting that this approach does not come with any practical complications when dealing with mutual information and divergence. Instead, the continuous version of the entropy, known as differential entropy, should be treated with care, as it does not satisfy some key properties of the entropy such as non-negativity.

## A.1 Entropy

As proposed by Claude Shannon, the amount of information received from the observation of a discrete random variable \( X \sim p(x) \) defined over a finite alphabet should be measured by the amount of uncertainty about its value prior to its measurement. To this end, we consider the problem of estimating the value of \( x \) when one only knows the following:
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 6

Context: # PREFACE

about 60% correct on 100 categories, the fact that we pull it off seemingly effortlessly serves as a “proof of concept” that it can be done. But there is no doubt in my mind that building truly intelligent machines will involve learning from data.

The first reason for the recent successes of machine learning and the growth of the field as a whole is rooted in its multidisciplinary character. Machine learning emerged from AI but quickly incorporated ideas from fields as diverse as statistics, probability, computer science, information theory, convex optimization, control theory, cognitive science, theoretical neuroscience, physics, and more. To give an example, the main conference in this field is called: **advances in neural information processing systems**, referring to information theory and theoretical neuroscience and cognitive science.

The second, perhaps more important reason for the growth of machine learning is the exponential growth of both available data and computer power. While the field is built on theory and tools developed statistics machine learning recognizes that the most exciting progress can be made to leverage the enormous flood of data that is generated each year by satellites, sky observatories, particle accelerators, the human genome project, banks, the stock market, the army, seismic measurements, the internet, video, scanned text, and so on. It is difficult to appreciate the exponential growth of data that our society is generating. To give an example, a modern satellite generates roughly the same amount of data all previous satellites produced together. This insight has shifted the attention from highly sophisticated modeling techniques on small datasets to more basic analysis on such larger data-sets (the latter sometimes called **data-mining**). Hence the emphasis shifted to algorithmic efficiency and as a result many machine learning faculty (like myself) can typically be found in computer science departments. To give some examples of recent successes of this approach one would only have to turn on one computer and perform an internet search. Modern search engines do not run terribly sophisticated algorithms, but they manage to store and sift through almost the entire content of the internet to return sensible search results. There has also been much success in the field of machine translation, not because a new model was invented but because many more translated documents became available.

The field of machine learning is multifaceted and expanding fast. To sample a few sub-disciplines: statistical learning, kernel methods, graphical models, artificial neural networks, fuzzy logic, Bayesian methods, and so on. The field also covers many types of learning problems, such as supervised learning, supervised learning, semi-supervised learning, active learning, reinforcement learning, etc. I will only cover the most basic approaches in this book from a highly per
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 7

Context: # MEANT FOR INDUSTRY AS WELL AS BACKGROUND READING

This book was written during my sabbatical at the Radboud University in Nijmegen (Netherlands). I would like to thank Hans for discussion on intuition. I also thank Prof. Bert Kappen, who leads an excellent group of postdocs and students for his hospitality. Marga, kids, UCI,...

---

There are a few main aspects I want to cover from a personal perspective. Instead of trying to address every aspect of the entire field, I have chosen to present a few popular and perhaps useful tools and approaches. What will (hopefully) be significantly different than most other scientific books is the manner in which I will present these methods. I have always been frustrated by the lack of proper explanation of equations. Many times, I have been staring at a formula without the slightest clue where it came from or how it was derived. Many books also excel in stating facts in an almost encyclopedic style, without providing the proper intuition of the method. This is my primary mission: to write a book that conveys intuition. The first chapter will be devoted to why I think this is important.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 9

Context: # Learning and Intuition

We have all experienced the situation that the solution to a problem presents itself while riding your bike, walking home, “relaxing” in the washroom, waking up in the morning, taking your shower, etc. Importantly, it did not appear while banging your head against the problem in a conscious effort to solve it, staring at the equations on a piece of paper. In fact, I would claim that all my bits and pieces of progress have occurred while taking a break and “relaxing out of the problem.” Greek philosophers walked in circles when thinking about a problem; most of us stare at a computer screen all day. The purpose of this chapter is to make you more aware of where your creative mind is located and to interact with it in a fruitful manner.

My general thesis is that, contrary to popular belief, creative thinking is not performed by conscious thinking. It is rather an interplay between your conscious mind, which prepares the seeds to be planted into the unconscious part of your mind. The unconscious mind will munch on the problem “out of sight” and return promising roads to solutions to the consciousness. This process iterates until the conscious mind decides the problem is sufficiently solved, intractable, or plain dull and moves on to the next. It may be a little unsettling to learn that at least part of your thinking goes on in a part of your mind that seems inaccessible and has a very limited interface with what you think of as yourself. But it is undeniable that it is there, and it is also undeniable that it plays a role in the creative thought process.

To become a creative thinker, one should learn to play this game more effectively. To do so, we should think about the language in which to represent knowledge that is most effective in terms of communication with the unconscious. In other words, what type of “interface” between conscious and unconscious mind should we use? It is probably not a good idea to memorize all the details of a complicated equation or problem. Instead, we should extract the abstract idea and capture the essence of it in a picture. This could be a movie with colors and other elements.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 11

Context: Many people may find this somewhat experimental way to introduce students to new topics counter-productive. Undoubtedly for many it will be. If you feel under-challenged and become bored, I recommend moving on to the more advanced textbooks, of which there are many excellent samples on the market (for a list see [books](#)). But I hope that for most beginning students, this intuitive style of writing may help to gain a deeper understanding of the ideas that I will present in the following. Above all, have fun!
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 15

Context: # 1.1. DATA REPRESENTATION

Most datasets can be represented as a matrix, \( X = [X_{n,k}] \), with rows indexed by "attribute-index" \( i \) and columns indexed by "data-index" \( n \). The value \( X_{n,k} \) for attribute \( i \) and data-case \( n \) can be binary, real, discrete, etc., depending on what we measure. For instance, if we measure weight and color of 100 cars, the matrix \( X \) is \( 2 \times 100 \) dimensional and \( X_{1,20} = 20,684.57 \) is the weight of car nr. 20 in some units (a real value) while \( X_{2,20} = 2 \) is the color of car nr. 20 (say one of 6 predefined colors).

Most datasets can be cast in this form (but not all). For documents, we can give each distinct word of a prespecified vocabulary a number and simply count how often a word was present. Say the word "book" is defined to have nr. 10, 568 in the vocabulary then \( X_{1,1005} = 4 \) would mean: the word book appeared 4 times in document 5076. Sometimes the different data-cases do not have the same number of attributes. Consider searching the internet for images about cats. You’ll retrieve a large variety of images most with a different number of pixels. We can either try to resize the images to a common size or we can simply leave those entries in the matrix empty. It may also occur that a certain entry is supposed to be there but it couldn’t be measured. For instance, if we run an optical character recognition system on a scanned document some letters will not be recognized. We’ll use a question mark “?” to indicate that that entry wasn’t observed.

It is very important to realize that there are many ways to represent data and not all are equally suitable for analysis. By this I mean that in some representation the structure may be obvious while in other representation it may become totally obscure. It is still there, but just harder to find. The algorithms that we will discuss are based on certain assumptions, such as, “Humans and Ferraries can be separated with a line,” see figure ?. While this may be true if we measure weight in kilograms and height in meters, it is no longer true if we decide to re-code these numbers into bit-strings. The structure is still in the data, but we would need a much more complex assumption to discover it. A lesson to be learned is thus to spend some time thinking about in which representation the structure is as obvious as possible and transform the data if necessary before applying standard algorithms. In the next section we’ll discuss some standard preprocessing operations. It is often advisable to visualize the data before preprocessing and analyzing it. This will often tell you if the structure is a good match for the algorithm you had in mind for further analysis. Chapter ? will discuss some elementary visualization techniques.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 25

Context: # Information Theory and Image Compression

The part of the information that does not carry over to the future, the unpredictable information, is called "noise." There is also the information that is predictable, the learnable part of the information stream. The task of any learning algorithm is to separate the predictable part from the unpredictable part.

Now imagine Bob wants to send an image to Alice. He has to pay 1 dollar cent for every bit that he sends. If the image were completely white, it would be really stupid of Bob to send the message: 

```
pixel 1: white, pixel 2: white, pixel 3: white, .... 
```

He could just have sent the message:

```
all pixels are white!
```

The blank image is completely predictable but carries very little information. Now imagine a image that consists of white noise (your television screen if the cable is not connected). To send the exact image Bob will have to send:

```
pixel 1: white, pixel 2: black, pixel 3: black, .... 
```

Bob cannot do better because there is no predictable information in that image, i.e., there is no structure to be modeled. You can imagine playing a game and revealing one pixel at a time to someone and paying him 15 for every next pixel he predicts correctly. For the white image, you can do perfect for the noisy picture you would be random guessing. Real pictures are in between: some pixels are very hard to predict, while others are easier. 

To compress the image, Bob can extract rules such as: always predict the same color as the majority of the pixels next to you, except when there is an edge. These rules constitute the model for the regularities of the image. Instead of sending the entire image pixel by pixel, Bob will now first send his rules and ask Alice to apply the rules. Every time the rule fails, Bob also sends a correction:

```
pixel 103: white, pixel 245: black.
```

A few rules and two corrections are obviously cheaper than 256 pixel values and no rules.

There is one fundamental tradeoff hidden in this game. Since Bob is sending only a single image, it does not pay to send an incredibly complicated model that would require more bits to explain than simply sending all pixel values. If he were sending 1 billion images it would pay off to first send the complicated model because he would be saving a fraction of all bits for every image. On the other hand, if Bob wants to send 2 pixels, there really is no need in sending a model whatsoever. 

Therefore, the size of Bob's model depends on the amount of data he wants to transmit. In this context, the boundary between what is model and what is noise depends on how much data we are dealing with! If we use a model that is too complex, we overfit to the data at hand, i.e., part of the model represents noise. On the other hand, if we use a too simple model we "underfit" (over-generalize) and valuable structure remains unmodeled. Both lead to sub-optimal compression of the image. 

The compression game can therefore be used to find the right size of model complexity for a given dataset.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 38

Context: # CHAPTER 6. THE NAIVE BAYESIAN CLASSIFIER

An example of the traffic that it generates: the University of California Irvine receives on the order of 2 million spam emails a day. Fortunately, the bulk of these emails (approximately 97%) is filtered out or dumped into your spam box and will reach your attention. 

How is this done? Well, it turns out to be a classic example of a classification problem: spam or ham, that’s the question. Let’s say that spam will receive a label 1 and ham a label 0. Our task is thus to label each new email with either 0 or 1. What are the attributes? Rephrasing this question, what would you measure in an email to see if it is spam? Certainly, if I read “viagra” in the subject I would stop right there and dump it in the spam box. What else?

Here are a few: “enlargement,” “cheap,” “buy,” “pharmacy,” “money,” “loan,” “mortgage,” “credit,” and so on. We can build a dictionary of words that we can detect in each email. This dictionary could also include word phrases such as “buy now,” “penis enlargement,” one can make phrases as sophisticated as necessary. One could measure whether the words or phrases appear at least once or one could count the actual number of times they appear. Spammers know about the way these spam filters work and counteract by slight misspellings of certain key words. Hence we might also want to detect words like “via gra” and so on. In fact, a small arms race has ensued where spam filters and spam generators find tricks to counteract the tricks of the “opponent.” 

Putting all these subtleties aside for a moment we’ll simply assume that we measure a number of these attributes for every email in a dataset. We’ll also assume that we have spam/ham labels for these emails, which were acquired by someone removing spam emails by hand from his/her inbox. Our task is then to train a predictor for spam/ham labels for future emails where we have access to attributes but not to labels.

The NB model is what we call a "generative" model. This means that we imagine how the data was generated in an abstract sense. For emails, this works as follows, an imaginary entity first decides how many spam and ham emails it will generate on a daily basis. Say, it decides to generate 40% spam and 60% ham. We will assume this doesn’t change with time (of course it doesn’t, but we will make this simplifying assumption for now). It will then decide what the chance is that a certain word appears \( x \) times in a spam email. For example, the word “viagra” has a chance of 96% to not appear at all, 1% to appear once, 0.9% to appear twice, etc. These probabilities are clearly different for spam and ham. “Viagra” should have a much smaller probability to appear in a ham email (but of course; consider I send this text to my publisher by email). Given these probabilities, we can then go on and try to generate emails that actually look like real emails, i.e., with proper sentences, but we won’t need that in the following. Instead we make the simplifying assumption that email consists of “a bag of words,” in random order.
##########

"""QUERY: can you tell me what is discrete math in 500 words?"""

Consider the chat history for relevant information. Use all information included. Use as much tokens as needed. Important: If you find information separated by a | in the context, it is a table formatted in Markdown.Important: Take a look at the QUERY and only the QUERY. If this is vague or unclear, please ignore everything and ask a follow-up question instead!


Final Files Sources: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 1, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 2, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 3, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 5, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 6, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 9, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 10, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 11, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 12, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 13, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 15, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 18, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 21, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 23, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 24, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 25, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 27, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 30, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 31, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 33, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 35, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 39, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 40, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 43, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 51, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 60, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 65, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 67, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 69, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 72, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 74, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 81, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 85, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 89, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 93, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 95, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 97, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 102, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 103, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 105, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 111, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 116, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 121, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 131, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 135, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 144, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 145, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 149, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 150, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 151, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 153, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 156, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 159, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 161, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 165, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 168, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 170, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 171, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 175, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 188, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 197, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 201, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 205, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 207, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 216, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 217, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 219, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 222, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 224, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 226, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 231, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 233, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 235, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 238, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 240, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 244, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 250, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 252, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 253, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 254, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 257, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 6, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 13, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 17, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 20, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 49, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 56, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 64, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 86, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 120, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 127, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 176, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 213, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 6, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 7, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 9, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 11, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 15, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 25, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 38
==================================================
**Elapsed Time: 2.89 seconds**
==================================================


FINAL ANSWER 
Answer: Discrete mathematics is a branch of mathematics that deals with discrete elements that use distinct values. It encompasses a wide range of topics that are fundamental to computer science and information technology. Unlike continuous mathematics, which deals with objects that can vary smoothly, discrete mathematics focuses on objects that can be distinctly separated and counted.

### Key Concepts in Discrete Mathematics

1. **Sets**:
   - **Definition**: A set is a collection of distinct objects, considered as an object in its own right. For example, the set of natural numbers \( \{1, 2, 3, \ldots\} \).
   - **Notation**: Sets are usually denoted by curly braces, e.g., \( \{a, b, c\} \).
   - **Operations**: Common operations on sets include union, intersection, and complement.

2. **Functions**:
   - **Definition**: A function is a relation between a set of inputs and a set of permissible outputs. Each input is related to exactly one output.
   - **Types of Functions**: Functions can be injective (one-to-one), surjective (onto), or bijective (both one-to-one and onto).

3. **Relations**:
   - **Definition**: A relation between two sets is a collection of ordered pairs containing one object from each set.
   - **Properties**: Relations can have properties such as reflexivity, symmetry, and transitivity.

4. **Probability**:
   - **Definition**: Probability is the measure of the likelihood that an event will occur. It quantifies uncertainty.
   - **Applications**: Used in various fields to predict outcomes and model random processes.

5. **Graph Theory**:
   - **Graphs**: A graph is a collection of nodes (or vertices) and edges connecting pairs of nodes. Graphs can be directed or undirected.
   - **Applications**: Graphs are used to model networks, such as social networks, computer networks, and transportation systems.

6. **Trees**:
   - **Definition**: A tree is a special type of graph that has no cycles and is connected. It has a hierarchical structure with a root node and child nodes.
   - **Applications**: Trees are used in data structures, such as binary search trees and heaps, and in algorithms like searching and sorting.

7. **Combinatorics**:
   - **Definition**: Combinatorics is the study of counting, arrangement, and combination of objects.
   - **Applications**: Used in fields like cryptography, coding theory, and algorithm design.

8. **Logic**:
   - **Propositional Logic**: Deals with propositions and their truth values. Uses logical operators like AND, OR, NOT, and IMPLIES.
   - **Predicate Logic**: Extends propositional logic by dealing with predicates and quantifiers like "for all" (∀) and "there exists" (∃).

### Applications of Discrete Mathematics

Discrete mathematics is essential for computer science and information technology. It provides the mathematical foundations for algorithms, data structures, cryptography, network theory, and more. For example, understanding graph theory is crucial for designing efficient network algorithms, while combinatorics is fundamental for analyzing the complexity of algorithms.

### Conclusion

Discrete mathematics is a vital field that underpins much of modern computer science and technology. It deals with distinct and separable objects, providing the tools and frameworks necessary for understanding and solving complex problems in various domains. Whether it's through the study of sets, functions, relations, probability, graphs, trees, combinatorics, or logic, discrete mathematics offers a rich and diverse set of concepts that are both theoretically intriguing and practically indispensable.
==================================================
**Elapsed Time: 9.86 seconds**
==================================================