{
    "query": "can you summarize in 3 sentence what is discrete math?",
    "namespace": "e791d701-717d-4c06-9602-2878acdbb6ad",
    "messages": [
        {
            "role": "assistant",
            "content": "Hallo diverse NF SepTest! Ich bin Simon, der KI-Chatbot, der das gro\u00dfe Sprachmodell gpt-4o-mini-2024-07-18 verwendet, um Ihre Fragen zu beantworten. Ich freue mich, Ihnen behilflich zu sein!\n\nHier sind einige Themen, \u00fcber die wir sprechen k\u00f6nnten: \n\n1. \ud83d\udcda Grundlagen der diskreten Mathematik\n2. \ud83d\udcca Einf\u00fchrung in maschinelles Lernen\n3. \ud83d\udd0d Wahrscheinlichkeitsmodelle und deren Anwendungen\n4. \ud83e\uddee Kombinatorik und Z\u00e4hlprinzipien\n5. \ud83d\udcc8 Klassifikationsalgorithmen im maschinellen Lernen\n6. \ud83e\udde0 Nearest Neighbors Klassifikation\n7. \ud83d\udcc9 Support Vector Machines (SVM)\n8. \ud83d\udd17 Bayesianische Netzwerke und Markov-Zufallsfelder\n9. \ud83d\udccf Dimensionale Reduktionstechniken wie PCA\n10. \ud83d\udd04 Reinforcement Learning und seine Anwendungen\n\nGibt es ein bestimmtes Thema auf dieser Liste, das Sie n\u00e4her erkunden m\u00f6chten?"
        },
        {
            "role": "user",
            "content": "halo"
        },
        {
            "role": "assistant",
            "content": "Hallo! Wie kann ich Ihnen helfen? Gibt es eine spezifische Frage oder ein Thema, \u00fcber das Sie mehr erfahren m\u00f6chten?"
        },
        {
            "role": "user",
            "content": "can you summarize in 3 sentence what is discrete math?"
        },
        {
            "role": "assistant",
            "content": ""
        }
    ],
    "stream": true,
    "language_level": "",
    "chat_channel": ":3997:1726667082838",
    "language": "German",
    "tone": "neutral",
    "writing_style": "standard",
    "model": "gpt-4o-mini-2024-07-18",
    "knowledgebase": "ki-dev-large",
    "seed": 1055,
    "client_id": 1726667082838,
    "all_context": false,
    "follow_up_for": null,
    "knowledgebase_files_count": 3,
    "override_command": null,
    "disable_clarity_check": false,
    "custom_primer": "",
    "logging": true,
    "query_route": ""
}


INITIALIZATION
Knowledgebase: ki-dev-large
Base Query: can you summarize in 3 sentence what is discrete math?
Model: gpt-4o-mini-2024-07-18
Use Curl?: None
==================================================
**Elapsed Time: 0.00 seconds**
==================================================


ROUTING
Query type: summary
==================================================
**Elapsed Time: 2.10 seconds**
==================================================


RAG PARAMETERS
Max Context To Include: 100
Lowest Score to Consider: 0.1
==================================================
**Elapsed Time: 0.14 seconds**
==================================================


VECTOR SEARCH ALGORITHM TO USE 
Use MMR search?: True
Use Similarity search?: False
==================================================
**Elapsed Time: 0.10 seconds**
==================================================


VECTOR SEARCH DONE 
==================================================
**Elapsed Time: 6.36 seconds**
==================================================


PRIMER 
Primer: 
            You are Simon, a highly intelligent personal assistant in a system called KIOS. You are a chatbot that
            can read knowledgebases through the "CONTEXT" that is included in the user's chat message.

        
            Your role is to act as an expert at summarization and analysis.

            In your responses to enterprise users, prioritize clarity, trustworthiness, and appropriate formality.

            Be honest by admitting when a topic falls outside your scope of knowledge, and suggest
            alternative avenues for obtaining information when necessary.

            Make effective use of chat history to avoid redundancy and enhance response relevance, continuously
            adapting to integrate all necessary details in your interactions.

            Use as much tokens as possible to provide a detailed response.
        
==================================================
**Elapsed Time: 0.19 seconds**
==================================================


FINAL QUERY 
Final Query: CONTEXT: ##########
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 1

Context: # A Cool Brisk Walk  
Through Discrete Mathematics

**Version 2.2.1**  
**Stephen Davies, Ph.D.**  

## Table of Contents
1. Introduction
2. Concepts
   - Sets
   - Functions
   - Relations
3. Applications
4. Conclusion

## Introduction
In this document, we explore various topics under discrete mathematics. 

## Concepts

### Sets
- Definition
- Notation
- Operations

### Functions
- Definition
- Types of functions
- Examples

### Relations
- Definition
- Properties

| Concept   | Description                       |
|-----------|-----------------------------------|
| Set      | A collection of distinct objects   |
| Function  | A relation from a set to itself    |
| Relation  | A connection between two sets       |

## Applications
The concepts of discrete mathematics can be applied in various fields, including computer science, information technology, and combinatorial design.

## Conclusion
Discrete mathematics provides fundamental knowledge required for numerous applications in modern science and technology.

Image Analysis: 

### Analysis of the Attached Visual Content

#### 1. Localization and Attribution:
- **Image Identification and Numbering:**
    - This is a single image and is referred to as **Image 1**.

#### 2. Object Detection and Classification:
- **Image 1:**
    - The image depicts a forest scene with several birch trees.
    - **Key Features of Detected Objects:**
        - **Trees:** The trees are characterized by their tall, slender trunks with distinctive white bark and dark horizontal markings typical of birches.
        - **Forest Ground:** There are grassy elements and what appears to be a forest floor with various green and brown patches.

#### 3. Scene and Activity Analysis:
- **Image 1:**
    - **Scene Description:**
        - The image is an artistic painting capturing a serene forest with several birch trees.
        - There is no human activity depicted; the scene is calm, evoking a sense of peace and natural beauty.

#### 4. Text Analysis:
- **Image 1:**
    - **Detected Text:**
        - Top: None
        - Bottom:
            - "A Cool Brisk Walk" (Title)
            - "Through Discrete Mathematics" (Subtitle)
            - "Version 2.2.1" (Version Information)
            - "Stephen Davies, Ph.D." (Author Name)
    - **Text Significance:**
        - The text suggests that this image is probably the cover of a book titled "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies, Ph.D. The version number indicates it is version 2.2.1, signifying a possibly updated or revised edition.

#### 8. Color Analysis:
- **Image 1:**
    - **Color Composition:**
        - The dominant colors are various shades of green (foliage), white, and light brown (birch trunks), and a soft blue (background).
    - **Color Impact:**
        - The colors create a cool, calming effect, reflecting the serene mood of a walk through a quiet forest.

#### 9. Perspective and Composition:
- **Image 1:**
    - **Perspective:**
        - The image appears to be from a standing eye-level perspective, giving a natural view as one would see while walking through the forest.
    - **Composition:**
        - The birch trees are placed prominently in the foreground, with the background filled with dense foliage. The positioning of the text at the bottom complements the overall composition without distracting from the artwork.

#### 10. Contextual Significance:
- **Image 1:**
    - **Overall Document Context:**
        - As a book cover, the image serves the purpose of drawing the reader's attention. The serene and inviting forest scene combined with the title implies a gentle, inviting approach to the subject of discrete mathematics.

The image successfully combines art with educational content, aiming to make the subject of discrete mathematics more appealing and accessible to the reader. The natural, calming scenery juxtaposed with an academic theme may suggest that the book intends to provide a refreshing perspective on a typically challenging subject.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 3

Context: # Contents at a glance

## Contents at a glance
- **Contents at a glance** …… i
- **Preface** …… iii
- **Acknowledgements** …… v
  
1. **Meetup at the trailhead** …… 1
2. **Sets** …… 7
3. **Relations** …… 35
4. **Probability** …… 59
5. **Structures** …… 85
6. **Counting** …… 141
7. **Numbers** …… 165
8. **Logic** …… 197
9. **Proof** …… 223

Also be sure to check out the forever-free-and-open-source instructional videos that accompany this series at [www.allthemath.org](http://www.allthemath.org)!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 5

Context: # Preface

Discrete math is a popular book topic — start Googling around and you’ll find a zillion different textbooks about it. Take a closer look, and you’ll discover that most of these are pretty thick, dense volumes packed with lots of equations and proofs. They’re principled approaches, written by mathematicians and (seemingly) to mathematicians. I speak with complete frankness when I say I’m comforted to know that the human race is well covered in this area. We need smart people who can derive complex expressions and prove theorems from scratch, and I’m glad we have them.

Your average computer science practitioner, however, might be better served by a different approach. There are elements to the discrete math mindset that a budding software developer needs experience with. This is why discrete math is (properly, I believe) part of the mandatory curriculum for most computer science undergraduate programs. But for future programmers and engineers, the emphasis should be different than it is for mathematicians and researchers in computing theory. A practical computer scientist mostly needs to be able to use these tools, not to derive them. She needs familiarity, and practice, with the fundamental concepts and the thought processes they involve. The number of times the average software developer will need to construct a proof in graph theory is probably near zero. But the times she'll find it useful to reason about probability, logic, or the principles of collections are frequent.

I believe the majority of computer science students benefit most from simply gaining an appreciation for the richness and rigor of
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 6

Context: # PREFACE

This material, what it means, and how it impacts their discipline. Becoming an expert theorem prover is not required, nor is deriving closed-form expressions for the sizes of trees with esoteric properties. Basic fluency with each topic area, and an intuition about when it can be applied, is the proper aim for most of those who would go forward and build tomorrow's technology.

To this end, the book in your hands is a quick guided tour of introductory-level discrete mathematics. It's like a cool, brisk walk through a pretty forest. I point out the notable features of the landscape and try to instill a sense of appreciation and even of awe. I want the reader to get a feel for the lay of the land and a little exercise. If the student acquires the requisite vocabulary, gets some practice playing with the toys, and learns to start thinking in terms of the concepts here described, I will count it as a success.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 9

Context: # Chapter 1

## Meetup at the trailhead

Before we set out on our “cool, brisk walk,” let’s get oriented. What is discrete mathematics, anyway? Why is it called that? What does it encompass? And what is it good for?

Let’s take the two words of the subject, in reverse order. First, **math**. When most people hear “math,” they think “numbers.” After all, isn’t math the study of quantity? And isn’t that the class where we first learned to count, add, and multiply?

Mathematics certainly has its root in the study of numbers — specifically, the “natural numbers” (the integers from 1 on up) that fascinated the ancient Greeks. Yet math is broader than this, almost to the point where numbers can be considered a special case of something deeper. In this book, when we talk about trees, sets, or formal logic, there might not be a number in sight.

Math is about **abstract, conceptual objects that have properties,** and the implications of those properties. An “object” can be any kind of “thought material” that we can define and reason about precisely. Much of math deals with questions like, “Suppose we defined a certain kind of thing that had certain attributes. What would be the implications of this, if we reasoned it all the way out?” The “thing” may or may not be numerical, whatever it turns out to be. Like a number, however, it will be crisply defined, have certain known aspects to it, and be capable of combining with other things in some way.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 10

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Fundamental to math is that it deals with the **abstract**. Abstract, which is the opposite of concrete, essentially means something that can’t be perceived with the senses. A computer chip is concrete: you can touch it, you can see it. A number is not; nor is a function, a binary tree, or a logical implication. The only way to perceive these things is with the power of the mind. We will write expressions and draw pictures of many of our mathematical structures in order to help visualize them, and nearly everything we study will have practical applications whereby the abstractness gets grounded in concreteness for some useful purpose. But the underlying mathematical entity remains abstract and ethereal — only accessible to the mind’s eye. We may use a pencil to form the figure “5” on a piece of paper, but that is only a concrete manifestation of the underlying concept of “five-ness.” Don’t mistake the picture or the symbol for the thing itself, which always transcends any mere physical representation.

The other word in the name of our subject is **“discrete”** (not to be confused with “discreet,” which means something else entirely). The best way to appreciate what discrete means is to contrast it with its opposite, **continuous**. Consider the following list:

| Discrete                      | Continuous             |
|-------------------------------|------------------------|
| whole numbers (\( \mathbb{Z} \))  | real numbers (\( \mathbb{R} \)) |
| int                           | double                 |
| digital                       | analog                 |
| quantum                       | continuum              |
| counting                      | measuring              |
| number theory                 | analysis               |
| \( \Sigma \)                  | \( \int \frac{d}{dx} \) |

What do the left-hand entries have in common? They describe things that are measured in crisp, distinct intervals, rather than varying smoothly over a range. Discrete things jump suddenly from position to position, with rigid precision. If you’re 5 feet tall, you might some day grow to 5.3 feet; but though there might be 5
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 11

Context: # Understanding Integration and Differentiation

People in your family, there will never be 5.3 of them (although there could be 6 someday).

The last couple of entries on this list are worth a brief comment. They are math symbols, some of which you may be familiar with.

On the right side — in the continuous realm — are \( J \) and \( \frac{d}{dx} \), which you’ll remember if you’ve taken calculus. They stand for the two fundamental operations of integration and differentiation. Integration, which can be thought of as finding “the area under a curve,” is basically a way of adding up a whole infinite bunch of numbers over some range. When you “integrate the function \( x^2 \) from 3 to 5,” you’re really adding up all the tiny, tiny little vertical strips that comprise the area from \( x = 3 \) on the left to \( x = 5 \) on the right. Its corresponding entry in the left-column of the table is \( \Sigma \), which is just a short-hand for “sum up a bunch of things.” Integration and summation are equivalent operations; it’s just that when you integrate, you’re adding up all the (infinitely many) slivers across the real-line continuum. When you sum, you’re adding up a fixed sequence of entries, one at a time, like in a loop. \( \Sigma \) is just the discrete “version” of \( J \).

The same sort of relationship holds between ordinary subtraction (\(-\)) and differentiation (\(\frac{d}{dx}\)). If you’ve plotted a bunch of discrete points on \( x \)-\( y \) axes, and you want to find the slope between two of them, you just subtract their \( y \) values and divide by the \( x \) distance between them. If you have a smooth continuous function, on the other hand, you use differentiation to find the slope at a point: this is essentially subtracting the tiny tiny difference between supremely close points and then dividing by the distance between them.

Don’t worry; you don’t need to have fully understood any of the integration or differentiation stuff I just talked about, or even to have taken calculus yet. I’m just trying to give you some feel for what “discrete” means, and how the dichotomy between discrete and continuous really runs through all of math and computer science. In this book, we will mostly be focusing on discrete values and structures, which turn out to be of more use in computer science. That’s partially because, as you probably know, computers
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 12

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Themselves are discrete, and can only store and compute discrete values. There can be many of them — megabytes, gigabytes, terabytes — but each value stored is fundamentally comprised of bits, each of which has a value of either 0 or 1. This is unlike the human brain, by the way, whose neuronal synapses communicate based on the continuous quantities of chemicals present in their axons. So I guess “computer” and “brain” are another pair of entities we could add to our discrete vs. continuous list.

There’s another reason, though, why discrete math is of more use to computer scientists than continuous math is, beyond just the bits-and-bytes thing. Simply put, computers operate algorithmically. They carry out programs in step-by-step, iterative fashion. First do this, then do that, then move on to something else. This mechanical execution, like the ticking of a clock, permeates everything the computer can do, and everything we can tell it to do. At a given moment in time, the computer has completed step 7, but not step 8; it has accumulated 38 values, but not yet 39; its database has exactly 15 entries in it, no more and no less; it knows that after accepting this friend request, there will be exactly 553 people in your set of friends. The whole paradigm behind reasoning about computers and their programs is discrete, and that’s why we computer scientists find different problems worth thinking about than most of the world did a hundred years ago.

But it’s still math. It’s just discrete math. There’s a lot to come, so limber up and let me know when you’re ready to hit the road.

## 1.1 Exercises

Use an index card or a piece of paper folded lengthwise, and cover up the right-hand column of the exercises below. Read each exercise in the left-hand column, answer it in your mind, then slide the index card down to reveal the answer and see if you’re right! For every exercise you missed, figure out why you missed it before moving on.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 13

Context: # 1.1 EXERCISES

1. **What’s the opposite of concrete?**  
   Abstract.

2. **What’s the opposite of discrete?**  
   Continuous.

3. **Consider a quantity of water in a glass. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Concrete, since it’s a real entity you can experience with the senses. Continuous, since it could be any number of ounces (or liters, or tablespoons, or whatever). The amount of water certainly doesn’t have to be an integer. (Food for thought: since all matter is ultimately comprised of atoms, are even substances like water discrete?)

4. **Consider the number 27. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Abstract, since you can’t see or touch or smell “twenty-seven.” Probably discrete, since it’s an integer, and when we think of whole numbers we think “discrete.” (Food for thought: in real life, how would you know whether I meant the integer “27” or the decimal number “27.00”? And does it matter?) Clearly it’s discrete. Abstract vs. concrete, though, is a little tricky. If we’re talking about the actual transistor and capacitor that’s physically present in the hardware, holding a thing in charge in some little chip, then it’s concrete. But if we’re talking about the value “1” that is conceptually part of the computer’s currently executing state, then it’s really abstract just like 27 was. In this book, we’ll always be talking about bits in this second, abstract sense.

5. **Consider a bit in a computer’s memory. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Any kind of abstract object that has properties we can reason about. 

6. **If math isn’t just about numbers, what else is it about?**  
   Any kind of abstract object that has properties we can reason about.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 15

Context: # Chapter 2

## Sets

The place from which we'll start our walk is a body of mathematics called "set theory." Set theory has an amazing property: it's so simple and applicable that almost all the rest of mathematics can be based on it! This is all the more remarkable because set theory itself came along pretty late in the game (as things go) — it was singlehandedly invented by one brilliant man, Georg Cantor, in the 1870s. That may seem like a long time ago, but consider that by the time Cantor was born, mankind had already accumulated an immense wealth of mathematical knowledge: everything from geometry to algebra to calculus to prime numbers. Set theory was so elegant and universal, though, that after it was invented, nearly everything in math was redefined from the ground up to be couched in the language of sets. It turns out that this simple tool is an amazingly powerful way to reason about mathematical concepts of all flavors. Everything else in this book stands on set theory as a foundation.

Cantor, by the way, went insane as he tried to extend set theory to fully encompass the concept of infinity. Don't let that happen to you!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 17

Context: # 2.1. THE IDEA OF A SET

is the same whether we write it like this:

\( P = \{ \text{Dad}, \text{Mom} \} \)

or this:

\( P = \{ \text{Mom}, \text{Dad} \} \)

Those are just two different ways of writing the same thing.

The set \( E \) that had nobody in it can be written like this, of course:

\( E = \{ \} \)

but we sometimes use this special symbol instead:

\( E = \emptyset \)

However you write it, this kind of set (one that has no elements) is referred to as an **empty set**.

The set \( H \), above, contained all the members of the group under consideration. Sometimes we'll refer to "the group under consideration" as the **domain of discourse**. It too is a set, and we usually use the symbol \( \Omega \) to refer to it. So in this case,

\( \Omega = \{ \text{Mom}, \text{Johnny}, \text{T.J.}, \text{Dad}, \text{Lizzy} \} \)

Another symbol we'll use a lot is \( \in \), which means "is a member of." Since Lizzy is a female, we can write:

\( \text{Lizzy} \in F \)

to show that Lizzy is a member of the set \( F \). Conversely, we write:

\( \text{T.J.} \notin F \)

to show that T.J. is not.

As an aside, I mentioned that every item is either in, or not in, a set: there are no shades of gray. Interestingly, researchers have developed another body of mathematics called [I kid you not] *fuzzy*[^1] set theory.

[^1]: Some authors use the symbol \( U \) for this, and call it the **universal set**.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 18

Context: # CHAPTER 2. SETS

Set theory? Fuzzy sets change this membership assumption: items can indeed be “partially in” a set. One could declare, for instance, that Dad is “100% female,” which means he’s only 100% in the \( F \) set. That might or might not make sense for gender, but you can imagine that if we defined a set \( T \) of “the tall people,” such a notion might be useful. At any rate, this example illustrates a larger principle which is important to understand: in math, things are the way they are simply because we’ve decided it’s useful to think of them that way. If we decide there’s a different useful way to think about them, we can define new assumptions and proceed according to new rules. It doesn’t make any sense to say “sets are (or aren’t) really fuzzy”: because there is no “really.” All mathematics proceeds from whatever mathematicians have decided is useful to define, and any of it can be changed at any time if we see fit.

## 2.2 Defining sets

There are two ways to define a set: extensionally and intensionally. I’m not saying there are two kinds of sets; rather, there are simply two ways to specify a set.

To define a set extensionally is to list its actual members. That’s what we did when we said \( P = \{ Dad, Mom \} \) above. In this case, we’re not giving any “meaning” to the set; we’re just mechanically spelling out what’s in it. The elements Dad and Mom are called the **extension** of the set \( P \).

The other way to specify a set is intensionally, which means to describe its meaning. Another way to think of this is specifying a rule by which it can be determined whether or not a given element is in the set. If I say “Let \( P \) be the set of all parents,” I am defining \( P \) intensionally. I haven’t explicitly said which specific elements of the set are in \( P \). I’ve just given the meaning of the set, from which you can figure out the extension. We call “parent-ness” the **intension** of \( P \).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 21

Context: 2.4 Sets are not arrays
=======================

If you’ve done some computer programming, you might see a resemblance between sets and the collections of items often used in a program: arrays, perhaps, or linked lists. To be sure, there are some similarities. But there are also some very important differences, which must not be overlooked:

- **No order.** As previously mentioned, there is no order to the members of a set. “{Mom, Dad}” is the same set as “{Dad, Mom}”. In a computer program, of course, most arrays or lists have first, second, and last elements, and an index number assigned to each.

- **No duplicates.** Suppose \( M \) is the set of all males. What would it possibly mean to say \( M = \{T.J., T.J., Johnny\} \)? Would that mean that “T.J. is twice the man that Johnny is”? This is obviously nonsensical. The set \( M \) is based on a property: maleness. Each element of \( M \) is either male, or it isn't. It can't be “male three times.” Again, in an array or linked list, you could certainly have more than one copy of the same item in different positions.

- **Infinite sets.** 'Nuff said. I’ve never seen an array with infinitely many elements, and neither will you.

- **Untyped.** Most of the time, an array or other collection in a computer program contains elements of only a single type; it’s an array of integers, or a linked list of Customer objects, for example. This is important because the program often needs to treat all elements in the collection the same way. Perhaps it needs to loop over the array to add up all the numbers, or iterate through a linked list and search for customers who have not placed an order in the last six months.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 23

Context: # 2.6 Sets of Sets

Sets are heterogeneous — a single set can contain four universities, seven integers, and an image — and so it might occur to you that they can contain other sets as well. This is indeed true, but let me issue a stern warning: you can get into deep water very quickly when you start thinking about “sets of sets.” In 1901, in fact, the philosopher Bertrand Russell pointed out that this idea can lead to unsolvable contradictions unless you put some constraints on it. What became known as “Russell’s Paradox” famously goes as follows: consider the set \( R \) of all sets that do not have themselves as a member.

For this reason, we’ll be very careful to use curly braces to denote sets, and parentheses to denote ordered pairs.

By the way, although the word “coordinate” is often used to describe the elements of an ordered pair, that’s really a geometry-centric word that implies a visual plot of some kind. Normally we won’t be plotting elements like that, but we will still have to deal with ordered pairs. I’ll just call the constituent parts “elements” to make it more general.

Three-dimensional points need ordered triples \( (x, y, z) \), and it doesn’t take a rocket scientist to deduce that we could extend this to any number of elements. The question is what to call them, and you do sort of sound like a rocket scientist (or other generic nerd) when you say tuple. Some people rhyme this word with “Drupal,” and others with “couple,” by the way, and there seems to be no consensus. If you have an ordered pair-type thing with 5 elements, therefore, it’s a 5-tuple (or a quintuple). If it has 117 elements, it’s a 117-tuple, and there’s really nothing else to call it. The general term (if we don’t know or want to specify how many elements) is n-tuple. In any case, it’s an ordered sequence of elements that may contain duplicates, so it’s very different than a set.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 24

Context: # CHAPTER 2. SETS

as members³. Now is \( R \) a member of itself, or isn’t it? Either way you answer turns out to be wrong (try it!) which means that this whole setup must be flawed at some level.

The good news is that as long as you don’t deal with this kind of self-referential loop (“containing yourself as a member”) then it’s pretty safe to try at home. Consider this set:

\[
V = \{ 3, 5, \{ 5, 4 \}, 2 \}
\]

This set has four (not five) members. Three of \( V \)’s members are integers: 2, 3, and 5. The other one is a set (with no name given). That other set, by the way, has two members of its own: 4 and 5. If you were asked, “is \( 4 \in V \)?” the answer would be no.

As a corollary to this, there’s a difference between

\[
\emptyset
\]

and 

\[
\{ \emptyset \}
\]

The former is a set with no elements. The latter is a set with one element: and that element just happens to be a set with nothing in it.

## 2.7 Cardinality

When we talk about the number of elements in a set, we use the word cardinality. You’d think we could just call it the “size” of the set, but mathematicians sometimes like words that sound cool. The cardinality of \( M \) (the set of males, where the Davies family is the domain of discourse) is 3, because there are three elements in it. The cardinality of the empty set \( \emptyset \) is 0. The cardinality of the set of all integers is \( \aleph_0 \). Simple as that.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 25

Context: 2.8 Some special sets
=====================

In addition to the empty set, there are symbols for some other common sets, including:

- **Z** — the integers (positive, negative, and zero)
- **N** — the natural numbers (positive integers and zero)
- **Q** — the rational numbers (all numbers that can be expressed as an integer divided by another integer)
- **R** — the real numbers (all numbers that aren't imaginary, even decimal numbers that aren't rational)

The cardinality of all these sets is infinity, although, as I alluded to previously, \(\mathbb{R}\) is in some sense "greater than" \(\mathbb{N}\). For the curious, we say that \(\mathbb{N}\) is a countably infinite set, whereas \(\mathbb{R}\) is uncountably infinite. Speaking very loosely, this can be thought of this way: if we start counting up all the natural numbers, 0, 1, 2, 3, 4, ..., we will never get to the end of them. But at least we can start counting. With the real numbers, we can't even get off the ground. Where do you begin? Starting with 0 is fine, but then what's the "next" real number? Choosing anything for your second number inevitably skips a lot in between. Once you've digested this, I'll spring another shocking truth on you: \(|\mathbb{Q}| \text{ is actually equal to } |\mathbb{R}|, \text{ not greater than it as } |\mathbb{R}|. \text{ Cantor came up with an ingenious numbering scheme whereby all the rational numbers — including 3, -9, \text{ and } \frac{1}{17} — can be listed off regularly, in order, just like the integers can. And so } |\mathbb{Q}| = |\mathbb{N}| \neq |\mathbb{R}|. \text{ This kind of stuff can blow your mind.}
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 27

Context: # 2.9. COMBINING SETS

\( B \) is “all the things *period* that aren't in \( B' \).” Of course, “all the things period” means “all the things that we're currently talking about.” The domain of discourse \( \Omega \) is very important here. If we're talking about the Davies family, we would say that \( M = \{ \text{Mom}, \text{Lizzy} \} \), because those are all the Davies who aren’t male. If, on the other hand, \( \Omega \) is “the grand set of absolutely everything,” then not only is Mom a member of \( M \), but so is the number 12, the French Revolution, and my nightmare last Tuesday about a rabid platypus.

- **Cartesian product** (\( \times \)). Looks like multiplication, but very different. When you take the Cartesian product of two sets \( A \) and \( B \), you don’t even get the elements from the sets in the result. Instead, you get ordered pairs of elements. These ordered pairs represent each combination of an element from \( A \) and an element from \( B \). For instance, suppose \( A = \{ \text{Bob}, \text{Dave} \} \) and \( B = \{ \text{Jenny}, \text{Gabrielle}, \text{Tiffany} \} \). Then:

\[
A \times B = \{ (\text{Bob}, \text{Jenny}), (\text{Bob}, \text{Gabrielle}), (\text{Bob}, \text{Tiffany}), (\text{Dave}, \text{Jenny}), (\text{Dave}, \text{Gabrielle}), (\text{Dave}, \text{Tiffany}) \}
\]

Study that list. The first thing to realize is that it consists of neither guys nor girls, but of ordered pairs. (Clearly, for example, \( \text{Jenny} \in A \times B \).) Every guy appears exactly once with every girl, and the guy is always the first element of the ordered pair. Since we have two guys and three girls, there are six elements in the result, which is an easy way to remember the \( x \) sign that represents Cartesian product. (Do not worry, make the common mistake of thinking that \( A \times B \) is \( A \) or \( B \) as a set, not a number. The cardinality of the set, of course, is \( 6 \), so it's appropriate to write \( |A \times B| = 6 \).)

## Laws of Combining Sets

There are a bunch of handy facts that arise when combining sets using the above operators. The important thing is that these are all easily seen just by thinking about them for a moment. Put another way, these aren’t facts to memorize; they’re facts to look at and see.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 28

Context: # CHAPTER 2. SETS

for yourself. They’re just a few natural consequences of the way we’ve defined sets and operations, and there are many others.

- **Union and intersection are commutative.** As noted above, it’s easy to see that \( A \cup B \) will always give the same result as \( B \cup A \). Same goes for \( \cap \), though.

- **Union and intersection are associative.** “Associative” means that if you have an operator repeated several times, left to right, it doesn’t matter which order you evaluate them in. \( (A \cup B) \cup C \) will give the same result as \( A \cup (B \cup C) \). This means we can freely write expressions like \( X \cup Y \cup Z \) and no one can accuse us of being ambiguous. This is also true if you have three (or more) intersections in a row. Be careful, though: associativity does not hold if you have unions and intersections mixed together. If I write \( A \cap B \cap C \) it matters very much whether I do the union first or the intersection first. This is just how it works with numbers: \( 4 + 3 \) gives either \( 10 \) or \( 14 \) depending on the order of operations. In algebra, we learned that \( x \) has precedence over \( + \), and you’ll always do that one first in the absence of parentheses. We could establish a similar order for set operations, but we won’t: we’ll always make it explicit with parentheses.

- **Union and intersection are distributive.** You’ll recall from basic algebra that \( a \cdot (b + c) = ab + ac \). Similarly with sets,

\[ X \cap (Y \cup Z) = (X \cap Y) \cup (X \cap Z) \]

It's important to work this out for yourself rather than just memorize it as a rule. Why does it work? Well, take a concrete example. Suppose \( Y \) is the set of all female students, \( S \) is the set of all computer science majors, and \( Z \) is the set of all math majors. (Some students, of course, double-major in both.) The left-hand side of the equals sign says “first take all the math and computer science majors and put them in a group. Then, intersect that group with the women’s group.” The result is “women who are either computer science majors or math majors (or both).” Now
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 30

Context: # CHAPTER 2. SETS

Let \( X \) be the set of all known murderers. Let \( Y \) be the set of all known thieves. 

Again, it's best understood with a specific example. Let’s say you’re renting a house and want to make sure you don’t have any shady characters under the roof. 

Now, as a landlord, you don’t want any thieves or murderers renting your property. So who are you willing to rent to? Answer: if \( \Omega \) is the set of all people, you are willing to rent to \( X \cup Y \).

Why that? Because if you take \( X \cup Y \), that gives you all the undesirables: people who are either murderers or thieves (or both). You don’t want to rent to any of them. In fact, you want to rent to the complement of that set; namely, “anybody else.” Putting an overbar on that expression gives you all the non-thieves and non-murderers.

Very well. But now look at the right-hand side of the equation. \( X \) gives you the non-thieves. \( Y \) gives you the non-murderers. Now in order to get acceptable people, you want to rent only to someone who’s in both groups. Put another way, they have to be both a non-thief and a non-murderer in order for you to rent to them. Therefore, they must be in the intersection of the non-thief group and the non-murderer group. Therefore, the two sides of this equation are the same.

The other form of De Morgan’s law is stated by flipping the intersections and unions:

\[
X \cap Y = \overline{X \cup Y}
\]

Work this one out for yourself using a similar example, and convince yourself it’s always true.

Augustus De Morgan, by the way, was a brilliant 19th century mathematician with a wide range of interests. His name
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 31

Context: 2.10 Subsets
===========

We learned that the “∈” symbol is used to indicate set membership: the element on the left is a member of the set on the right. A related but distinct notion is the idea of a subset. When we say \( X \subset Y \) (pronounced “X is a subset of Y”), it means that every member of \( X \) is also a member of \( Y \). The reverse is not necessarily true, of course; otherwise “⊆” would just mean “=”. So if \( A = \{ Dad, Lizzy \} \) and \( K = \{ Dad, Mom, Lizzy \} \), then we can say \( A \subset K \).

Be careful about the distinction between “∈” and “⊆”, which are often confused. With “∈”, the thing on the left is an element, whereas with “⊆”, the thing on the left is a set. This is further complicated by the fact that the element on the left-hand side of “∈” might well be a set.

Let's see some examples. Suppose \( Q \) is the set \( \{ 4, 9, 4 \} \).  \( Q \) has three elements here, one of which is itself a set. Now suppose that we let \( P \) be the set \( \{ 4, 9 \} \). Question: Is \( P \subset Q \)? The answer is yes: the set \( \{ 4, 9 \} \) (which is the same as the set \( \{ 9, 4 \} \), just written in a different way) is in fact an element of the set \( Q \). Next question: Is \( P \subset Q \)? The answer is no, \( P \nsubseteq Q \). If there are two of them (4 and 4) also in \( Q \), then 4 is a member of \( Q \), not 9. Last question: If \( R \) is defined to be \( \{ 2, 4 \} \), is \( R \subset Q \)? The answer is yes, since both 2 and 4 are members of \( Q \).

Notice that the definition here is a subset of itself. Sometimes, though, it’s useful to talk about whether a set is really a subset of another, and you don’t want to “count” if the two sets are actually equal. This is called a proper subset, and the symbol for it is \( \subsetneq \). You can see the rationale for the choice of symbol, because “⊆” is kind of like “<” for numbers, and “⊂” is like “<”.

Every set is a subset (not necessarily a proper one) of itself, because
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 33

Context: ```markdown
## 2.11. POWER SETS

Don't lose sight. There are four elements to the power set of \( A \), each of which is one of the possible subsets. It might seem strange to talk about "all of the possible subsets"—when I first learned this stuff, I remember thinking at first that there would be no limit to the number of subsets you could make from a set. But of course there is. To create a subset, you can either include, or exclude, each one of the original set's members. In \( A \)’s case, you can either (1) include both Dad and Lizzy, or (2) include Dad but not Lizzy, or (3) include Lizzy but not Dad, or (4) exclude both, in which case your subset is empty. Therefore, \( P(A) \) includes all four of those subsets.

Now what's the cardinality of \( P(X) \) for some set \( X \)? That's an interesting question, and one well worth pondering. The answer ripples through the heart of a lot of combinatorics and the binary number system, topics we'll cover later. And the answer is right at our fingertips if we just extrapolate from the previous example. To form a subset of \( X \), we have a choice to either include, or else exclude, each of its elements. So there's two choices for the first element, and then whether we choose to include or exclude that first element, there are two choices for the second, etc. 

So if \( |X| = 2 \) (recall that this notation means \( X \) has two elements) or \( X \) has a cardinality of \( 2^n \), then its power set has \( 2 \times 2 \) members. If \( |X| = 3 \), then its power set has \( 2 \times 2 \times 2 \) members. In general:

\[
|P(X)| = 2^{|X|}.
\]

As a limiting case (and a brain-bender), notice that if \( X \) is the empty set, then \( P(X) \) has one (not zero) members, because there is in fact one subset of the empty set: namely, the empty set itself. So \( |X| = 0 \), and \( |P(X)| = 1 \). And that jives with the above formula.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 34

Context: # 2.12 Partitions

Finally, there’s a special variation on the subset concept called a **partition**. A partition is a group of subsets of another set that together are both **collectively exhaustive** and **mutually exclusive**. This means that every element of the original set is in **one and only one** of the sets in the partition. Formally, a partition of \( X \) is a group of sets \( X_1, X_2, \ldots, X_n \) such that:

\[ 
X_1 \cup X_2 \cup \cdots \cup X_n = X, 
\]

and 

\[ 
X_i \cap X_j = \emptyset \quad \text{for all } i, j. 
\]

So let’s say we’ve got a group of subsets that are supposedly a partition of \( X \). The first line above says that if we combine the contents of all of them, we get everything that’s in \( X \) (and nothing more). This is called being collectively exhaustive. The second line says that no two of the sets have anything in common; they are mutually exclusive.

As usual, an example is worth a thousand words. Suppose the set \( D \) is \(\{ \text{Dad, Mom, Lizzy, T.J., Johnny} \}\). A partition is any way of dividing \( D \) into subsets that meet the above conditions. One such partition is:

- \(\{ \text{Lizzy, T.J.} \}\),
- \(\{ \text{Mom, Dad} \}\), 
- \(\{ \text{Johnny} \}\).

Another one is:

- \(\{ \text{Lizzy} \}\),
- \(\{ \text{T.J.} \}\), 
- \(\{ \text{Mom} \}\),
- \(\{ \text{Johnny} \}\).

Yet another is:

- \(\emptyset\),
- \(\{ \text{Lizzy, T.J., Johnny, Mom, Dad} \}\), and 
- \(\emptyset\).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 36

Context: # CHAPTER 2: SETS

might be double-majoring in both. Hence, this group of subsets is neither mutually exclusive nor collectively exhaustive. It’s interesting to think about gender and partitions: when I grew up, I was taught that males and females were a partition of the human race. But now I’ve come to realize that there are non-binary persons who do not identify with either of those genders, and so it’s not a partition after all.

## Question:
Is the number of students \( |S| \) equal to the number of off-campus students plus the number of on-campus students? Obviously yes. But why? The answer: because the off-campus and on-campus students form a partition. If we added up the number of freshmen, sophomores, juniors, and seniors, we would also get \( |S| \). But adding up the number of computer science majors and English majors would almost certainly not be equal to \( |S| \), because some students would be double-counted and others counted not at all. This is an example of the kind of beautiful simplicity that partitions provide.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 37

Context: 2.13 Exercises
================

Use an index card or a piece of paper folded lengthwise, and cover up the right-hand column of the exercises below. Read each exercise in the left-hand column, answer it in your mind, then slide the index card down to reveal the answer and see if you're right! For every exercise you missed, figure out why you missed it before moving on.

1. Is the set \( \{ Will, Smith \} \) the same as the set \( \{ Smith, Will \} \)?

   Yes indeed.

2. Is the ordered pair \( (Will, Smith) \) the same as \( (Smith, Will) \)?

   No. Order matters with ordered pairs (hence the name), and with any size tuple for that matter.

3. Is the set \( \{ Luke, Leia \}, Han \) the same as the set \( \{ Luke, Leia, Han \} \)?

   No. For instance, the first set has Han as a member but the second does not. (Instead, it has another set as a member, and that inner set happens to include Han.)

4. What's the first element of the set \( \{ Cowboys, Redskins, Steelers \} \)?

   The question doesn't make sense. There is no "first element" of a set. All three teams are equally members of the set and could be listed in any order.

5. Let \( G = \{ Matthew, Mark, Luke, John \} \) be \( \{ Luke, Obi-wan, Yoda \} \). \( S \) be the set of all Star Wars characters, and \( F \) be the four gospels from the New Testament. Now then. Is \( J \subseteq G \)?

   No.

6. Is \( J \subseteq S \)?

   Yes.

7. Is Yoda \( \in J \)?

   Yes.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 39

Context: # 2.13. EXERCISES

21. What’s \( A \cap B \)?  
   \{ Macbeth, Hamlet \}.

22. What’s \( B \cap T \)?  
   ∅.

23. What’s \( B \cap \overline{T} \)?  
   B. (which is { Scrabble, Monopoly, Othello. })

24. What’s \( A \cup (B \cap T) \)?  
   \{ Hamlet, Othello, Macbeth \}.

25. What’s \( (A \cup B) \cap T \)?  
   \{ Macbeth, Hamlet \}.

26. What’s \( A \setminus B \)?  
   Simply \( T \), since these two sets have nothing in common.

27. What’s \( T' \setminus B \)?  
   \{ (Hamlet, Macbeth), (Hamlet, Hamlet), (Hamlet, Othello), (Village, Macbeth), (Village, Hamlet), (Village, Othello), (Town, Hamlet), (Town, Othello) \}. The order of the ordered pairs within the set is not important; the order of the elements within each ordered pair is important.

28. What’s \( T \cap X \)?  
   0.

29. What’s \( (B \cap B) \times (A \cap T) \)?  
   \{ (Scrabble, Hamlet), (Monopoly, Hamlet), (Othello, Hamlet) \}.

30. What’s \( I \cup B \cup T \)?  
   7.

31. What’s \( I \cap (A \cap T) \)?  
   21. (The first parenthesized expression gives rise to a set with 7 elements, and the second to a set with three elements (B itself). Each element from the first set gets paired with an element from the second, so there are 21 such pairings.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 40

Context: # CHAPTER 2. SETS

### 33. Is A an extensional set, or an intensional set?
The question doesn't make sense. Sets aren't "extensional" or "intensional"; rather, a given set can be described extensionally or intensionally. The description given in item 19 is an extensional one; an intensional description of the same set would be "The Shakespeare tragedies Stephen studied in high school."

### 34. Recall that G was defined as { Matthew, Mark, Luke, John }. Is this a partition of G?
- { Luke, Matthew }
- { John }

No, because the sets are not collectively exhaustive (Mark is missing).

### 35. Is this a partition of G?
- { Mark, Luke }
- { Matthew, Luke }

No, because the sets are neither collectively exhaustive (John is missing) nor mutually exclusive (Luke appears in two of them).

### 36. Is this a partition of G?
- { Matthew, Mark, Luke }
- { John }

Yes. (Trivia: this partitions the elements into the synoptic gospels and the non-synoptic gospels).

### 37. Is this a partition of G?
- { Matthew, Luke }
- { John, Mark }

Yes. (This partitions the elements into the gospels which feature a Christmas story and those that don't).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 41

Context: ## 2.13. EXERCISES

38. Is this a partition of \( G \)?

   Yes. (This partitions the elements into the gospels that were written by Jews, those that were written by Greeks, those that were written by Romans, and those that were written by Americans.)
   - \( \{ Matthew, John \} \)
   - \( \{ Luke \} \)
   - \( \{ Mark \} \)
   - \( \emptyset \)

39. What’s the power set of \( \{ Rihanna \} \)?

   \( \{ \{ Rihanna \}, \emptyset \} \)

   Yes, since \( \{ peanut, jelly \} \) is one of the eight subsets of \( \{ peanut, butter, jelly \} \). (Can you name the other seven?)

40. Is \( \{ peanut, jelly \} \in P(\{ peanut, butter, jelly \}) \)?

41. Is it true for every set \( S \) that \( S \in P(S) \)?

   Yep.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 43

Context: # Chapter 3

## Relations

Sets are fundamental to discrete math, both for what they represent in themselves and for how they can be combined to produce other sets. In this chapter, we’re going to learn a new way of combining sets, called relations.

### 3.1 The Idea of a Relation

A relation between a set \( X \) and \( Y \) is a subset of the Cartesian product. That one sentence packs in a whole heck of a lot, so spend a moment thinking deeply about it. Recall that \( X \times Y \) yields a set of ordered pairs, one for each combination of an element from \( X \) and an element from \( Y \). If \( X \) has 5 elements and \( Y \) has 4, then \( X \times Y \) is a set of 20 ordered pairs. To make it concrete, if \( X \) is the set \{ Harry, Ron, Hermione \} and \( Y \) is the set \{ Dr. Pepper, Mt. Dew \}, then \( X \times Y \) is:

- \( (Harry, Dr. Pepper) \)
- \( (Harry, Mt. Dew) \)
- \( (Ron, Dr. Pepper) \)
- \( (Ron, Mt. Dew) \)
- \( (Hermione, Dr. Pepper) \)
- \( (Hermione, Mt. Dew) \)

Convince yourself that every possible combination is in there. I listed them out methodically to make sure I didn’t miss any (all the Harry’s first, with each drink in order, then all the Ron’s, etc.), but of course there’s no order to the members of a set, so I could have listed them in any order.

Now if I define a relation between \( X \) and \( Y \), I’m simply specifying that certain of these ordered pairs are in the relation, and certain
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 49

Context: # 3.5. Properties of Endorelations

nor (me, you) would be in the relation, but in any event the two cannot co-exist.

Note that **antisymmetry** is very different from **symmetric**. An asymmetric relation is simply one that's not symmetric: in other words, there's some \((x, y)\) in there without a matching \((y, x)\). An antisymmetric relation, on the other hand, is one in which there are guaranteed to be no matching \((y, x)\) for any \((x, y)\).

If you have trouble visualizing this, here's another way to think about it: realize that most relations are neither symmetric nor antisymmetric. It's kind of a coincidence for a relation to be symmetric: that would mean for every single \((x, y)\) it contains, it also contains a \((y, x)\). (What are the chances?) Similarly, it's kind of a coincidence for a relation to be antisymmetric: that would mean for every single \((x, y)\) it contains, it doesn't contain a \((y, x)\). (Again, what are the chances?) Your average Joe relation is going to contain some \((x, y)\) pairs that have matching \((y, x)\) pairs, and some that don't have matches. Such relations (the vast majority) are simply asymmetric: that is, neither symmetric nor antisymmetric.

Shockingly, it's actually possible for a relation to be both symmetric and antisymmetric! (But not asymmetric.) For instance, the empty relation (with no ordered pairs) is both symmetric and antisymmetric. It's symmetric because for every ordered pair \((x, y)\) in it (of which there are zero), there's also the corresponding \((y, x)\). And similarly, for every ordered pair \((y, x)\), the corresponding \((x, y)\) is not present. Another example is a relation with only “doubles” in it — say, \((3, 3)\), \((7, 7)\), \((Fred, Fred)\). This, too, is both symmetric and antisymmetric (work it out!).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 52

Context: # CHAPTER 3. RELATIONS

I contend that in this toy example, “isAtLeastAsToughAs” is a partial order, and \( A \) along with \( isAtLeastAsToughAs \) together form a poset. I reason as follows. It’s reflexive, since we started off by adding every dog with itself. It’s antisymmetric, since when we need add both \( (x, y) \) and \( (y, x) \) to the relation. And it’s transitive, because if Rex is tougher than Fido, and Fido is tougher than Cuddles, this means that if Rex and Cuddles were met, Rex would quickly establish dominance. (I’m no zoologist, and am not sure if the last condition truly applies with real dogs. But let’s pretend it does.)

It’s called a “partial order” because it establishes a partial, but incomplete, hierarchy among dogs. If we ask, “is dog X tougher than dog Y?” the answer is never ambiguous. We’re never going to say, “well, dog X was superior to dog A, who was superior to dog Y . . .” but then again, dog Y was superior to dog B, who was superior to dog X, so there’s no telling which of X and Y is truly toughest.” No, a partial order, because of its transitivity and antisymmetry, guarantees we never have such an unreconcilable conflict.

However, we could have a lack of information. Suppose Rex has never met Killer, and nobody Rex has met has ever met anyone Killer has met. There’s no chain between them. They’re in two separate universes as far as we’re concerned, and we’d have no way of knowing which was toughest. It doesn’t have to be that extreme, though. Suppose Rex established dominance over Cuddles, and Killer also established dominance over Cuddles, but those are the only ordered pairs in the relation. Again, there’s no way to tell whether Rex or Killer is the tougher dog. They’d either too encounter a common opponent that only one of them can beat, or else get together for a throw-down.

So a partial order gives us some semblance of structure — the relation establishes a directionality, and we’re guaranteed not to get wrapped up in contradictions — but it doesn’t completely order all the elements. If it does, it’s called a total order.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 54

Context: # CHAPTER 3. RELATIONS

“Which element of \( Y \) goes with \( X \)?” and we will always get back a well-defined answer. We can’t really do that with relations in general, because the answer might be “none” or “several.” Take a look back at the \( R \) and \( S \) examples above: what answer would we get if we asked “which drink goes Hermione map to?” for either relation? Answer: there is no answer.

But with functions, I can freely ask that question because I know I’ll get a kosher answer. With \( F \), I can ask, “which drink does Hermione map to?” and the answer is “Mt. Dew.” In symbols, we write this as follows:

\[ F(\text{Hermione}) = \text{Mt. Dew} \]

This will look familiar to computer programmers since it resembles a function call. In fact, it is a function call. That’s exactly what it is. Functions in languages like C++ and Java were in fact named after this discrete math notion. And if you know anything about programming, you know that in a program I can “call the \( F \) function” and “pass it the argument ‘Hermione’” and “get the return value ‘Mt. Dew’.” I never have to worry about getting more than one value back or getting none at all.

You might also remember discussing functions in high school math, and the so-called “vertical line test.” When you plotted the values of a numerical function on a graph, and there was no vertical (up-and-down) line that intersected more than one point, you could safely call the plot a “function.” That’s really exactly the same as the condition I just gave for functions, stated graphically. If a function passes the vertical line test, then there is no \( y \) value for which there’s more than one \( x \) value. This means it makes sense to ask “which is the value of \( y \) for a particular value of \( x \)?” You’ll always get one and only one answer. There’s no such thing, of course, as a “horizontal line test,” since functions are free to map more than one \( x \) value to the same \( y \) value. They just can’t do the reverse.

The difference between the functions of high school math and the functions we’re talking about here, by the way, is simply that our
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 55

Context: # 3.6 FUNCTIONS

Functions aren’t necessarily numeric. Sometimes we do draw “plots” of sorts, though, like this one:

```
Harry          Mt. Dew
          ↘    
Ron ──────── Dr. Pepper
          ↖    
Hermione
```
*Figure 3.1: A function represented graphically.*

This simply shows which elements of the domain map to which elements of the codomain. The left blob is the domain, the right blob is the codomain, and there’s an arrow representing each mapping.

Now as with relations, functions normally have “meaning.” We could define a function called `firstTasted` that associates each wizard with the soft drink he or she first sampled as a child. We could define another called `faveDrink` that maps each wizard to his or her favorite — presuming that every wizard has a favorite drink in the set (Hermione will have to overlook her iced tea and choose among the options provided). A third function called `wouldChooseWithMexicanFood` provides information about which drink each wizard provides with that type of cuisine. Here are Ron’s values for each of the three functions:

- `firstTasted(Ron) = Mt. Dew`
- `faveDrink(Ron) = Mt. Dew`
- `wouldChooseWithMexicanFood(Ron) = Dr. Pepper`

These values indicate that Mt. Dew was the soda pop that Ron first sipped, and it has been his favorite ever since, although at La Estrellita he prefers a Pepper.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 60

Context: # CHAPTER 3. RELATIONS

own wizard, and no soft drinks (or wizards) are left out. How exciting! This is a perfectly bijective function, also called a **bijection**. Again, the only way to get a bijection is for the domain and codomain to be the same size (although that alone does not guarantee a bijection; witness \( f \), above). Also observe that if they are the same size, then injectivity and surjectivity go hand-in-hand. Violate one, and you’re bound to violate the other. Uphold the one, and you’re bound to uphold the other. There’s a nice, pleasing, symmetrical elegance to the whole idea.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 65

Context: # 3.8. EXERCISES 

29. **Is faveSport injective?**  
   No, because Julie and Sam (and Chuck) all map to the same value (basketball). For a function to be injective, there must be no two domain elements that map to the same codomain element.

30. **Is there any way to make it injective?**  
   Not without altering the underlying sets. There are three athletes and two sports, so we can’t help but map multiple athletes to the same sport.

31. **Fine. Is faveSport surjective?**  
   No, because no one maps to volleyball.

32. **Is there any way to make it surjective?**  
   Sure, for instance, change Sam from basketball to volleyball. Now both of the codomain elements are "reachable" by some domain element, so it’s surjective.

33. **Is faveSport now also bijective?**  
   No, because it’s still not injective.

34. **How can we alter things so that it’s bijective?**  
   One way is to add a third sport—say, kickboxing—and move either Julie or Chuck over to kickboxing. If we have Julie map to kickboxing, Sam map to volleyball, and Chuck map to basketball, we have a bijection.

35. **How do we normally write the fact that "Julie maps to kickboxing"?**  
   faveSport(Julie) = kickboxing.

36. **What’s another name for "injective"?**  
   One-to-one.

37. **What’s another name for "surjective"?**  
   Onto.

38. **What’s another name for "range"?**  
   Image.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 67

Context: # Chapter 4

## Probability

Probability is the study of **uncertainty**. This may seem like a hopeless endeavor, sort of like knowing the unknowable, but it's not. The study of probability gives us tools for taming the uncertain world we live and program in, and for reasoning about it in a precise and helpful way.

We may not know exactly how long a particular visitor is willing to wait for our webpage to load in their browser, but we can use probability to estimate how much traffic we’ll lose if this takes longer than a certain average duration. We may not know which specific passwords a hacker will try as he attempts to break our security protocol, but we can use probability to estimate how feasible this approach will be for him. We may not know exactly when a certain program will run out of RAM and have to swap its data out to virtual memory, but we can predict how often this is likely to occur — and how painful it will be for us — given a certain system load and user behavior.

The trick is to use the tools we've already built — sets, relations, functions — to characterize and structure our notions of the relative likelihood of various outcomes. Once those underpinnings are secured, a layer of deductive reasoning will help us make good use of that information to begin to predict the future.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 68

Context: # 4.1 Outcomes and events

Since life is uncertain, we don’t know for sure what is going to happen. But let’s start by assuming we know what things might happen. Something that might happen is called an outcome. You can think of this as the result of an experiment if you want to, although normally we won’t be talking about outcomes that we have explicitly manipulated and measured via scientific means. It’s more like we’re just curious how some particular happening is going to turn out, and we’ve identified the different ways it can turn out and called them outcomes.

Now we’ve been using the symbol \( \Omega \) to refer to “the domain of discourse” or “the universal set” or “all the stuff we’re talking about.” We’re going to give it yet another name now: the sample space. \( \Omega \), the sample space, is simply the *set of all possible outcomes*. Any particular outcome — call it \( O \) — is an element of this set, just like in Chapter 1 every conceivable element was a member of the domain of discourse.

If a woman is about to have a baby, we might define \( \Omega \) as \{ boy, girl \}. Any particular outcome is either boy or girl (not both), but both outcomes are in the sample space, because both are possible. If we roll a die, we’d define \( \Omega \) as \{ 1, 2, 3, 4, 5, 6 \}. If we’re interested in motor vehicle safety, we might define \( \Omega \) for a particular road trip as \{ safe, accident \}. The outcomes don’t have to be equally likely; an important point we’ll return to soon.

In probability, we define an event as a *subset of the sample space*. In other words, an event is a group of related outcomes (though an event might contain just one outcome, or even zero). I always thought this was a funny definition for the word “event”: it’s not the first thing that word brings to mind. But it turns out to be a useful concept, because sometimes we’re not interested in any particular outcome necessarily, but rather in whether the outcome — whatever it is — has a certain property. For instance, suppose at the start of some game, my opponent and I each roll the die, agreeing that the highest roller gets to go first. Suppose he rolls a 2. Now it’s my turn. The \( \Omega \) for my die roll is of course \{ 1, 2, 3 \}.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 71

Context: # 4.2. PROBABILITY MEASURES

Now it turns out that not just any function will do as a probability measure, even if the domain (events) and codomain (real numbers in the range [0,1]) are correct. In order for a function to be a “valid” probability measure, it must satisfy several other rules:

1. \( \text{Pr}(\Omega) = 1 \)
2. \( \text{Pr}(A) \geq 0 \) for all \( A \subseteq \Omega \)
3. \( \text{Pr}(A \cup B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \cap B) \)

Rule 1 basically means "something has to happen." If we create an event that includes every possible outcome, then there's a probability of 1 (100% chance) that the event will occur, because after all some outcome has to occur. (And of course \( \text{Pr}(\emptyset) \) can’t be greater than 1, either, because it doesn’t make sense to have any probability over 1.) Rule 2 says there’s no negative probabilities: you can’t define any event, no matter how remote, that has a less than zero chance of happening.

Rule 3 is called the “additivity property,” and is a bit more difficult to get your head around. A diagram works wonders. Consider Figure 4.1, called a "Venn diagram," which visually depicts sets and their contents. Here we have defined three events: \( F \) (as above) is the event that the winner is a woman; \( R \) is the event that the winner is a rock musician (perhaps in addition to other musical genres); and \( U \) is the event that the winner is underage (i.e., becomes a multimillionaire before they can legally drink). Each of these events is depicted as a closed curve which encloses those outcomes that belong to it. There is obviously a great deal of overlap.

Now back to rule 3. Suppose I ask “what’s the probability that the All-time Idol winner is underage or a rock star?” Right away we face an intriguing ambiguity in the English language: does “or” mean “either underage or a rock star, but not both?” Or does it mean “underage and/or rock star?” The former interpretation is called an exclusive or and the latter an inclusive or. In computer science, we will almost always be assuming an inclusive or, unless explicitly noted otherwise.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 75

Context: # 4.3. PHILOSOPHICAL INTERLUDE

The idea of how often heads will occur in the long run. But if we flip it a million times and get 500,372 heads, we can confidently say that the probability of getting a head on a single flip is approximately .500.

This much isn’t controversial: it’s more like common sense. But the frequentist philosophy states that this is really the only way that probability can be defined. It’s what probability is: the frequency with which we can expect certain outcomes to occur, based on our observations of their past behavior. Probabilities only make sense for things that are repeatable, and reflect a known, reliable trend in how often they produce certain results. Historical proponents of this philosophy include John Venn, the inventor of the aforementioned Venn diagram, and Ronald Fisher, one of the greatest biologists and statisticians of all time.

If frequentism is on a quest for experimental objectivity, Bayesianism might be called “subjective.” This isn’t to say it’s arbitrary or sloppy. It simply has a different notion of what probability ultimately means. Bayesians interpret probability as a quantitative personal assessment of the likelihood of something happening. They point out that for many (most) events of interest, trials are neither possible nor sensible. Suppose I’m considering asking a girl out to the prom, and I’m trying to estimate how likely it is she’ll go with me. It’s not like I’m going to ask her a hundred times and count how many times she says yes, then divide by 100 to get a probability. There is in fact no way to perform a trial or use past data to guide me, and at any rate she’s only going to say yes or no once. So based on my background knowledge and my assumptions about her, myself, and the world, I form an opinion which could be quantified as a “percent chance.”

Once I’ve formed this opinion (which of course involves guesswork and subjectivity) I can then reason about it mathematically, using all the tools we’ve been developing. Of special interest to Bayesians is the notion of updating probabilities when new information comes to light, a topic we’ll return to in a moment. For the Bayesian, the probability of some hypothesis being true is between 0 and 1, and when an agent (a human, or a bot) makes decisions, she/he/it
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 81

Context: # 4.5 TOTAL PROBABILITY

Pr(A) = Pr(A ∩ C₁) + Pr(A ∩ C₂) + ... + Pr(A ∩ C_N)  
= Pr(A|C₁)Pr(C₁) + Pr(A|C₂)Pr(C₂) + ... + Pr(A|C_N)Pr(C_N)  
= ∑ᵏ₌₁ Pr(A|Cₖ)Pr(Cₖ)  

is the formula.¹

Let's take an example of this approach. Suppose that as part of a promotion for Muvico Cinemas movie theatre, we're planning to give a door prize to the 1000th customer this Saturday afternoon. We want to know, though, the probability that this person will be a minor. Figuring out how many patrons overall will be under 18 might be difficult. But suppose we're showing these three films on Saturday: *Spider-Man: No Way Home*, *Here Before*, and *Sonic the Hedgehog 2*. We can estimate the fraction of each movie's viewers that will be minors: .6, .01, and .95, respectively. We can also predict how many tickets will be sold for each film: 2,000 for *Spider-Man*, 500 for *Here Before*, and 1,000 for *Sonic*.

Applying frequentist principles, we can compute the probability that a particular visitor will be seeing each of the movies:

- \(Pr(Spider-Man) = \frac{2000}{2000 + 500 + 1000} = 0.571\)

- \(Pr(Here \ Before) = \frac{500}{2000 + 500 + 1000} = 0.143\)

- \(Pr(Sonic) = \frac{1500}{2000 + 500 + 1000} = 0.286\)

¹ If you're not familiar with the notation in that last line, realize that \(Σ\) (a capital Greek "sigma") just represents a sort of loop with a counter. The \(k\) in the sign means that the counter is \(k\) and starts at 1. The "N" above the line means the counter goes up to \(N\), which is its last value. And what does the loop do? It adds up a cumulative sum. The thing being added to the total each time through the loop is the expression to the right of the sign. The last line with the \(Σ\) is just a more compact way of expressing the preceding line.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 85

Context: # 4.6. BAYES' THEOREM

See how that works? If I do have the disease (and there's a 1 in 1,000 chance of that), there's a .99 probability of me testing positive. On the other hand, if I don't have the disease (a 999 in 1,000 chance of that), there's a .01 probability of me testing positive anyway. The sum of those two mutually exclusive probabilities is 0.01098.

Now we can use our Bayes’ Theorem formula to deduce:

\[ 
P(D|T) = \frac{P(T|D) P(D)}{P(T)} 
\]

\[
P(D|T) = \frac{.99 \cdot \frac{1}{1000}}{0.01098} \approx .0902 
\]

Wow. We tested positive on a 99% accurate medical exam, yet we only have about a 9% chance of actually having the disease! Great news for the patient, but a head-scratcher for the math student. How can we understand this? Well, the key is to look back at that Total Probability calculation in equation 4.1. Remember that there were two ways to test positive: one where you had the disease, and one where you didn't. Look at the contribution to the whole that each of those two probabilities produced. The first was 0.00099, and the second was 0.9999, over ten times higher. Why? Simply because the disease is so rare. Think about it: the test fails once every hundred times, but a random person only has the disease once every thousand times. If you test positive, it’s far more likely that the test screwed up than that you actually have the disease, which is rarer than blue moons.

Anyway, all of this about diseases and tests is a side note. The main point is that Bayes' Theorem allows us to recast a search for \(P(X|Y)\) into a search for \(P(Y|X)\), which is often easier to find numbers for.

One of many computer science applications of Bayes' Theorem is in text mining. In this field, we computationally analyze the words in documents in order to automatically classify them or form summaries or conclusions about their contents. One goal might be to identify the true author of a document, given samples of the writing of various suspected authors. Consider the Federalist Papers,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 89

Context: # 4.7 INDEPENDENCE

Whoops!

One last point on the topic of independence: please don’t make the mistake of thinking that mutually exclusive events are independent! This is by no means the case, and, in fact, the opposite is true. If two events are mutually exclusive, they are extremely dependent on each other!

Consider the most trivial case: I choose a random person on campus, and define \( I \) as the event that they’re an in-state student, and \( O \) as the event that they’re out-of-state. Clearly these events are mutually exclusive. But are they independent? Of course not! Think about it: if I told you a person was in-state, would that tell you anything about whether they were out-of-state? Duh. In a mutual exclusive case like this, event \( I \) completely rules out \( O \) (and vice versa), which means that although \( \text{Pr}(I) \) might be .635, \( \text{Pr}(I|O) \) is a big fat zero. More generally, \( \text{Pr}(A|B) \) is most certainly not going to be equal to \( \text{Pr}(A) \) if the two events are mutually exclusive, because learning about one event tells you everything about the other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 95

Context: # 5.1. GRAPHS

The edges, we have precisely the structure of a graph. Psychologists have given this another name: a semantic network. It is thought that the myriad of concepts you have committed to memory — Abraham Lincoln, and bar of soap, and my fall schedule, and perhaps millions of others — are all associated in your mind in a vast semantic network that links the related concepts together. When your mind recalls information, or deduces facts, or even drifts randomly in idle moments, it’s essentially traversing this graph along the various edges that exist between vertices.

That’s deep. But you don’t have to go near that deep to see the appearance of graph structures all throughout computer science. What’s MapQuest, if not a giant graph where the vertices are travelable locations and the edges are routes between them? What’s Facebook, if not a giant graph where the vertices are people and the edges are friendships? What’s the World Wide Web, if not a giant graph where the vertices are pages and the edges are hyperlinks? What’s the Internet, if not a giant graph where the vertices are computers or routers and the edges are communication links between them? This simple scheme of linked vertices is powerful enough to accommodate a whole host of applications, which is worth studying.

## Graph Terms

The study of graphs brings with it a whole bevy of new terms which are important to use precisely:

1. **Vertex.** Every graph contains zero or more vertices.¹ (These are also sometimes called nodes, concepts, or objects.)

2. **Edge.** Every graph contains zero or more edges. (These are also sometimes called links, connections, associations, or relationships.) Each edge connects exactly two vertices, unless the edge connects a vertex to itself, which is possible, believe it.

> ¹ The phrase "zero or more" is common in discrete math. In this case, it indicates that the empty graph, which contains no vertices at all, is still a legitimate graph.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 99

Context: # 5.1. GRAPHS

A graph of dependencies like this must be both **directed** and **acyclic**, or it wouldn’t make sense. Directed, of course, means that task X can require task Y to be completed before it, without the reverse also being true. If they both depended on each other, we’d have an infinite loop, and no brownies could ever get baked! Acyclic means that no kind of cycle can exist in the graph, even one that goes through multiple vertices. Such a cycle would again result in an infinite loop, making the project hopeless. Imagine if there were an arrow from **bake for 30 mins** back to **grease pan** in Figure 5.4. Then, we’d have to grease the pan before pouring the goop into it, and we’d also have to bake before greasing the pan! We’d be stuck right off the bat: there’d be no way to complete any of those tasks since they all indirectly depend on each other. A graph that is both directed and acyclic (and therefore free of these problems) is sometimes called a **DAG** for short.

```
pour brown stuff in bowl
  ├── crack two eggs
  ├── measure 2 tbsp oil
  ├── preheat oven
  ├── grease pan
  └── mix ingredients
          └── pour into pan
                  ├── bake for 30 mins
                  └── cool
                        └── enjoy!
```

*Figure 5.4: A DAG.*
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 100

Context: # CHAPTER 5. STRUCTURES

## Spatial Positioning

One important thing to understand about graphs is which aspects of a diagram are relevant. Specifically, the spatial positioning of the vertices doesn’t matter. In Figure 5.2 we view Muhammad Ali in the mid-upper left, and Sonny Liston in the extreme upper right. But this was an arbitrary choice, and irrelevant. More specifically, this isn’t part of the information the diagram claims to represent. We could have positioned the vertices differently, as in Figure 5.5, and had the same graph. In both diagrams, there are the same vertices, and the same edges between them (check me). Therefore, these are mathematically the same graph.

```
George Foreman
Sonny Liston
Muhammad Ali
Joe Frazier
```

Figure 5.5: A different look to the same graph as Figure 5.2.

This might not seem surprising for the prize fighter graph, but for graphs like the MapQuest graph, which actually represent physical locations, it can seem jarring. In Figure 5.3 we could have drawn Richmond north of Fredericksburg, and Virginia Beach on the far west side of the diagram, and still had the same graph, provided that all the nodes and links were the same. Just remember that the spatial positioning is designed for human convenience, and isn’t part of the mathematical information. It’s similar to how there’s no order to the elements of a set, even though when we specify a set extensionally, we have to list them in some order to avoid writing all the element names on top of each other. On a graph diagram, we have to draw each vertex somewhere, but where we put it is simply aesthetic.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 102

Context: # CHAPTER 5. STRUCTURES

## Figure 5.7: A graph depicting a symmetric endorelationship.

```
(Harry, Ron)  
(Ron, Harry)  
(Ron, Hermione)  
(Hermione, Ron)  
(Harry, Harry)  
(Neville, Neville)
```

```
    Harry
      |
      |
    Ron
    / \
   /   \
Hermione  Neville
```

Notice how the loops (edges from a node back to itself) in these diagrams represent ordered pairs in which both elements are the same.

Another connection between graphs and sets has to do with partitions. Figure 5.7 was not a connected graph: Neville couldn't be reached from any of the other nodes. Now consider: isn't a graph like this similar in some ways to a partition of \( A \) — namely, this one?

- \( \{ Harry, Ron, Hermione \} \) 
- \( \{ Neville \} \)

We've simply partitioned the elements of \( A \) into the groups that are connected. If you remove the edge between Harry and Ron in that graph, you have:

- \( \{ Harry \} \) 
- \( \{ Ron, Hermione \} \) 
- \( \{ Neville \} \)

Then add one between Hermione and Neville, and you now have: 

- \( \{ Harry \} \)
- \( \{ Ron, Hermione, Neville \} \)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 109

Context: # Graphs

## Depth-first traversal (DFT)

1. Choose a starting node.
2. Mark it and push it on an empty stack.
3. While the stack is not empty, do these steps:
   - a) Pop the top node off the stack.
   - b) Visit it.
   - c) Mark and push all of its unmarked adjacent nodes (in any order).

The algorithm in action is shown in Figure 5.11. The stack really made a difference! Instead of alternately exploring Chuck's and Izzy's paths, it bullheadedly darts down Chuck's path as far as it can go, all the way to hitting Izzy's back door. Only then does it back out and visit Izzy. This is because the stack always pops off what it just pushed on, whereas whatever got pushed first has to wait until everything else is done before it gets its chance. That first couple of pushes was critical: if we had pushed Chuck before Izzy at the very beginning, then we would have explored Chuck's entire world before arriving at Chuck's back door, instead of the other way around. As it is, Izzy got put on the bottom, and so she stayed on the bottom, which is inevitable with a stack.

DFT identifies disconnected graphs in the same way as BFT, and it similarly avoids getting stuck in infinite loops when it encounters cycles. The only difference is the order in which it visits the nodes.

## Finding the shortest path

We'll look at two other important algorithms that involve graphs, specifically weighted graphs. The first one is called **Dijkstra's shortest-path algorithm**. This is a procedure for finding the shortest path between two nodes, if one exists. It was invented in 1956 by the legendary computer science pioneer Edsger Dijkstra.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 119

Context: 5.2 Trees
=========

you think that the shortest path between any two nodes would land right on this Prim network? Yet if you compare Figure 5.14 with Figure 5.13 you'll see that the quickest way from Bordeaux to Strasbourg is through Marseille, not Vichy.

So we end up with the remarkable fact that the shortest route between two points has nothing whatsoever to do with the shortest total distance between all points. Who knew?

5.2 Trees
---------

A tree is really nothing but a simplification of a graph. There are two kinds of trees in the world: free trees, and rooted trees.  

### Free trees

A free tree is just a connected graph with no cycles. Every node is reachable from the others, and there’s only one way to get anywhere. Take a look at Figure 5.15. It looks just like a graph (and it is) but unlike the WWII France graph, it’s more skeletal. This is because in some sense, a free tree doesn’t contain anything “extra.”

If you have a free tree, the following interesting facts are true:

1. There’s exactly one path between any two nodes. (Check it!)
2. If you remove any edge, the graph becomes disconnected. (Try it!)
3. If you add any new edge, you end up adding a cycle. (Try it!)
4. If there are n nodes, there are n - 1 edges. (Think about it!)

So basically, if your goal is connecting all the nodes, and you have a free tree, you’re all set. Adding anything is redundant, and taking away anything breaks it.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 129

Context: # 5.2. TREES

![Figure 5.20: The order of node visitation in in-order traversal.](path/to/image)

After visiting the root, says, “Okay, let’s pretend we started this whole traversal thing with the smaller tree rooted at my left child. Once that’s finished, wake me up so I can similarly start it with my right child.” Recursion is a very common and useful way to solve certain complex problems, and trees are rife with opportunities.

## Sizes of binary trees

Binary trees can be any ragged old shape, like our Figure 5.17 example. Sometimes, though, we want to talk about binary trees with a more regular shape that satisfy certain conditions. In particular, we’ll talk about three special kinds:

1. **Full binary tree.** A full binary tree is one in which every node (except the leaves) has two children. Put another way, every node has either two children or none; no strangeness allowed. Figure 5.17 is not full, but it would be if we added the three blank nodes in Figure 5.21.

2. By the way, it isn’t always possible to have a full binary tree with a particular number of nodes. For instance, a binary tree with two nodes can't be full, since it inevitably will have a root with only one child.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 142

Context: # CHAPTER 5. STRUCTURES

Thirdly, there are specialized data structures you may learn about in future courses, such as AVL trees and red-black trees, which are binary search trees that add extra rules to prevent imbalancing. Basically, the idea is that when a node is inserted (or removed), certain metrics are checked to make sure that the change didn’t cause too great an imbalance. If it did, the tree is adjusted to minimize the imbalance. This comes at a slight cost every time the tree is changed, but prevents any possibility of a lopsided tree that would cause slow lookups in the long run.

## 5.3 Final Word

Whew, that was a lot of information about structures. Before we continue our walk in the next chapter with a completely different topic, I’ll leave you with this summary thought. Let \( BST \) be the set of Binary Search Trees, and \( BT \) be the set of Binary Trees. Let \( RT \) be the set of rooted trees, and \( T \) be the set of trees (free or rooted). Finally, let \( CG \) be the set of connected graphs, and \( G \) the set of all graphs. Then we have:

\[ 
BST \subseteq BT \subseteq RT \subseteq CG \subseteq G 
\]

It’s a beautiful thing.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 144

Context: # CHAPTER 5. STRUCTURES

10. How many vertices and edges are there in the graph below?

   ![Graph](image_reference_placeholder)

   **7 and 10, respectively.**

11. What's the degree of vertex L?  
   **It has an in-degree of 2, and an out-degree of 1.**

12. Is this graph directed?  
   **Yes.**

13. Is this graph connected?  
   **Depends on what we mean. There are two different notions of “connectedness” for directed graphs. One is strongly connected, which means every vertex is reachable from any other by following the arrows in their specified directions. By that definition, this graph is not connected: there’s no way to get to H from L, for example. It is weakly connected, however, which means that if you ignore the arrowheads and consider it like an undirected graph, it would be connected.**

14. Is it a tree?  
   **No. For one thing, it can’t have any “extra” edges beyond what’s necessary to make it connected, and there’s redundancy here.**

15. Is it a DAG?  
   **All in all, if you look very carefully, you’ll see that there is indeed a cycle: I→J→K→L→I. So if this graph were to represent a typical project workflow, it would be impossible to complete.**

16. If we reversed the direction of the I→J edge, would it be a DAG?  
   **Yes. The edges could now be completed in this order: H, G, L, I, M, K, and finally J.**
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 145

Context: # 5.4. EXERCISES

17. If this graph represented an endorelation, how many ordered pairs would it have?  
   **Answer:** 10.

18. Suppose we traversed the graph below in depth-first fashion, starting with node P. In what order would we visit the nodes?

   ![Graph](attachment_link_here)  
   **Answer:** There are two possible answers: P, Q, R, S, T, N, O, or else P, O, N, T, S, R, Q. (The choice just depends on whether we go "left" or "right" initially.) Note in particular that either O or Q is at the very end of the list.

19. Now we traverse the same graph breadth-first fashion, starting with node P. Now in what order would we visit the nodes?

   **Answer:** Again, two possible answers: P, O, Q, N, R, T, S, or else P, Q, R, N, S, T. Note in particular that both O and Q are visited very early.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 149

Context: # Chapter 6

## Counting

If the title of this chapter seems less than inspiring, it’s only because the kind of counting we learned as children was mostly of a straightforward kind. In this chapter, we’re going to learn to answer some more difficult questions like “how many different semester schedules could a college student possibly have?” and “how many different passwords can a customer choose for this e-commerce website?” and “how likely is this network buffer to overflow, given that its packets are addressed to three different destinations?”

The more impressive-sounding name for this topic is **combinatorics**. In combinatorics, we focus on two tasks: counting things (to find out how many there are), and enumerating things (to systematically list them as individuals). Some things turn out to be hard to count but easy to enumerate, and vice versa.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 150

Context: # 6.1 The Fundamental Theorem

We start with a basic rule that goes by the audacious name of **The Fundamental Theorem of Counting**. It goes like this:

If a whole can be divided into \( k \) parts, and there’s \( n_i \) choices for the \( i \)-th part, then there’s \( n_1 \times n_2 \times n_3 \times \cdots \times n_k \) ways of doing the whole thing.

## Example

Jane is ordering a new Lamborghini. She has twelve different paint colors to choose from (including Luscious Red and Sassy Yellow), three different interiors (Premium Leather, Bonded Leather, or Vinyl), and three different stereo systems. She must also choose between automatic and manual transmission, and she can get power locks & windows (or not). How many different configurations does Jane have to choose from? 

To put it another way, how many different kinds of cars could come off the line for her?

The key is that every one of her choices is independent of all the others. Choosing an Envious Green exterior doesn't constrain her choice of transmission, stereo, or anything else. So no matter which of the 12 paint colors she chooses, she can independently choose any of the three interiors, and no matter what these first two choices are, she can freely choose any of the stereos, etc. It’s a mix-and-match. Therefore, the answer is:

\[ 
12 \times 3 \times 3 \times 2 = 432 
\]

Here’s an alternate notation you’ll run into for this, by the way:

**How many other "Fundamental Theorems" of math do you know?** Here are a few: the Fundamental Theorem of Arithmetic says that any natural number can be broken down into its prime factors in only one way. The Fundamental Theorem of Algebra says that the highest power of a polynomial is how many roots (zeros) it has. The Fundamental Theorem of Linear Algebra says that the row space and the column space of a matrix have the same dimension. The Fundamental Theorem of Calculus says that integration and differentiation are the inverses of each other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 151

Context: # 6.1. THE FUNDAMENTAL THEOREM

which is just a shorter way of writing

\[ n_1 \times n_2 \times n_3 \cdots \times n_k. \]

As mentioned in section 4.5, the \( \Sigma \) notation is essentially a loop with a counter, and it says to add up the expression to the right of it for each value of the counter. The \( \Pi \) notation is exactly the same, only instead of adding the expressions together for each value of the counter, we're multiplying them. (The reason mathematicians chose the symbols \( \Sigma \) (sigma) and \( \Pi \) (pi) for this, by the way, is that “sigma” and “pi” start with the same letter as “sum” and “product,” respectively.)

We can actually get a lot of leverage just with the fundamental theorem. How many different PINs are possible for an ATM card? There are four digits, each of which can be any value from 0 to 9 (ten total values), so the answer is:

\[ 10 \times 10 \times 10 \times 10 = 10,000 \text{ different PINs.} \]

So a thief at an ATM machine frantically entering PINs at random (hoping to break your account before you call and stop your debit card) would have to try about 5,000 of them on average before cracking the code.

What about middle school bullies who are trying to break into your locker? Well, most combination locks are opened by a three-number sequence, each number of which is anything from 0 to 39. So there are:

\[ 40 \times 40 \times 40 = 64,000 \text{ different combinations.} \]

That's probably slightly overstated, since I'll bet consecutive repeat numbers are not allowed (Master probably doesn't manufacture a
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 152

Context: # CHAPTER 6. COUNTING

Every car in the state of Virginia must be issued its own license plate number. That's a lot of cars. How many different license plate combinations are available?

This one requires a bit more thought, since not all license numbers have the same number of characters. In addition to `SBE4756` and `PY9127`, you can also have `DAM6` or `LUV6` or even `U2`. How can we incorporate these?

The trick is to divide up our sets into mutually exclusive subsets, and then add up the cardinalities of the subsets. If only 7 characters fit on a license plate, then clearly every license plate number has either 1, 2, 3, 4, 5, 6, or 7 characters. And no license plate has two of these (i.e., there is no plate that is both 5 characters long and 6 characters long). Therefore they’re mutually exclusive subsets, and safe to add. This last point is often not fully appreciated, leading to errors. Be careful not to cavalierly add the cardinalities of non-mutually-exclusive sets! You’ll end up double-counting.

So we know that the number of possible license plates is equal to:

- The # of 7-character plates
- The # of 6-character plates
- The # of 5-character plates
- ...
- The # of 1-character plates.

Very well. Now we can figure out each one separately. How do we know how many 7-character plates there are? Well, if every character must be either a letter or a digit, then we have \(26 + 10 = 36\) choices for each character. This implies \(36^7\) different possible 7-character license plates. The total number of plates is therefore:

\[
36^7 + 36^6 + 36^5 + 36^4 + 36^3 + 36^2 + 36^1 = 60,603,140,212 \text{ plates}
\]

which is about ten times the population of the earth, so I think we're safe for now.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 155

Context: # 6.2 Permutations

Now, all we have to do is subtract to get:

**Total # of strings** - **# of illegal passwords** = **# of legit passwords**  
1,850,456,557,796,600,384 - 147,389,519,403,536,384 = 1,708,735,865,301,022,720

Legitimate passwords. Looks like we don’t lose much by requiring the non-alpha character.

The lesson learned is that if counting the elements in some set involves accounting for a lot of different sticky scenarios, it’s worth a try to count the elements *not* in the set instead, and see if that’s easier.

## Permutations

When we’re counting things, we often run into permutations. A **permutation** of n distinct objects is an arrangement of them in a sequence. For instance, suppose all three Davies kids need to brush their teeth, but only one of them can use the sink at a time. What order will they brush in? One possibility is Lizzy, then T.J., then Johnny. Another possibility is T.J., then Lizzy, then Johnny. Another is Johnny, then Lizzy, then T.J. These are all different permutations of the Davies kids. Turns out there are six of them (find all 6 for yourself)!

Counting the number of permutations is just a special application of the Fundamental Theorem of Counting. For the teeth brushing example, we have n = 3 different “parts” to the problem, each of which has n_k choices to allocate to it. There are three different Davies kids who could brush their teeth first, so n₁ = 3. Once that child is chosen, there are then two remaining children who could brush second, so n₂ = 2. Then, once we’ve selected a first-brusher and a second-brusher, there’s only one remaining choice for the third-brusher, so n₃ = 1. This means the total number of possible brushing orders is:

3 × 2 × 1 = 6.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 156

Context: # CHAPTER 6. COUNTING

This pattern comes up so much that mathematicians have established a special notation for it:

\[ n \times (n - 1) \times (n - 2) \cdots \times 1 = n! \; (\text{``n-factorial''}) \]

We say there are “3-factorial” different brushing orders for the Davies kids. For our purposes, the notion of factorial will only apply for integers, so there’s no such thing as 23.46! or π! (In advanced computer science applications, however, mathematicians sometimes do define factorial for non-integers.) We also define 0! to be 1, which might surprise you.

This comes up a heck of a lot. If I give you a jumbled set of letters to unscramble, like “KRBS” (think of the Jumble® word game in the newspaper), how many different unscramblings are there? The answer is 5! or 120, one of which is BRISK. Let's say I shuffle a deck of cards before playing War. How many different games of War are there? The answer is 52!, since any of the cards in the deck might be shuffled on top, then any but that top card could be second, then any of those two could be third, etc. Ten packets arrive near-simultaneously at a network router. How many ways can they be queued up for transmission? 10! ways, just like a larger Davies family.

The factorial function grows really, really fast, by the way, even faster than exponential functions. A five letter word like “BRISK” has 120 permutations, but “AMBIENTDROUSLY” has 87,178,291,200, ten times the population of the earth. The number of ways to shuffle a deck is 

\[ 52! \approx 8.065 \times 10^{67} \]

so I don’t think my boys will end up playing the same War game twice any time soon, nor my wife and I the same bridge hand.

---

**“War”** is a mindless card game which involves no strategy or decision-making on the part of the players. Once you shuffle the initial deck, the entire outcome of the game is fixed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 163

Context: # 6.3. COMBINATIONS

1. Phil  
2. Bubba  
3. Tiger  

Another one is:  

1. Phil  
2. Tiger  
3. Bubba  

and yet another is:  

1. Tiger  
2. Phil  
3. Bubba  

Now the important thing to recognize is that in our present problem — counting the possible number of regional-bound golf trios — all three of these different partial permutations represent the same combination. In all three cases, it's Bubba, Phil, and Tiger who will represent our local golf association in the regional competition. So by counting all three of them as separate partial permutations, we've overcounted the combinations.

Obviously, we want to count Bubba/Phil/Tiger only once. Okay then. How many times did we overcount it when we counted partial permutations? The answer is that we counted this trio once for every way it can be permuted. The three permutations, above, were examples of this, and so are these three:

1. Tiger  
2. Bubba  
3. Phil  
1. Bubba  
2. Tiger  
3. Phil  
1. Bubba  
2. Phil  
3. Tiger  

This makes a total of six times that we (redundantly) counted the same combination when we counted the partial permutations. Why 6? Because that's the value of 3!, of course. There are 3! different ways to arrange Bubba, Phil, and Tiger, since that's just a straight permutation of three elements. And so we find that every threesome we want to account for, we have counted 6 times.

The way to get the correct answer, then, is obviously to correct for this overcounting by dividing by 6:

\[
\frac{45 \times 44 \times 43}{3!} = \frac{45 \times 44 \times 43}{6} = 14,190 \text{ different threesomes.}
\]
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 165

Context: ```
# 6.3. COMBINATIONS

The coefficients for this binomial are of course 1 and 1, since "n" really means "1 . n." Now if we multiply this by itself, we get:

$ (x + y) \cdot (x + y) = 2x^2 + 2xy + y^2 $,

the coefficients of the terms being 1, 2, and 1. We do it again:

$ (x^2 + 2xy + y^2) \cdot (x + y) = x^3 + 3x^2y + 3xy^2 $ 

to get 1, 3, 3, and 1, and do it again:

$ (x^3 + 3x^2y + 3xy^2 + y^3) \cdot (x + y) = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4 $ 

to get 1, 4, 6, and 4. At this point you might be having flashbacks to Pascal's triangle, which perhaps you learned about in grade school, in which each entry in a row is the sum of the two entries immediately above it (to the left and right), as in Figure 6.1. (If you never learned that, don't worry about it.)

```
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
```
**Figure 6.1:** The first six rows of Pascal's triangle.

Now you might be wondering where I'm going with this. What do fun algebra tricks have to do with counting combinations of items? The answer is that the values of $ \binom{n}{k} $ are precisely the coefficients of these multiplied polynomials. Let n be 4, which corresponds to the last polynomial we multiplied out. We can then compute all the combinations of items taken from a group of four:

- $ \binom{4}{0} = 1 $ 
- $ \binom{4}{1} = 4 $
- $ \binom{4}{2} = 6 $ 
- $ \binom{4}{3} = 4 $ 
- $ \binom{4}{4} = 1 $ 

In other words, there is exactly **one** way of taking no items out of 4 (you simply don't take any). There are **four** ways of taking one out of 4.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 167

Context: # 6.4 Summary

Most of the time, counting problems all boil down to a variation of one of the following three basic situations:

- \( n^k \) — this is when we have \( k \) different things, each of which is free to take on one of \( n \) completely independent choices.

- \( \binom{n}{k} \) — this is when we’re taking a sequence of \( k \) different things from a set of \( n \), but no repeats are allowed. (A special case of this is \( n! \), when \( k = n \).)

- \( \binom{n}{k} \) — this is when we’re taking \( k \) different things from a set of \( n \), but the order doesn’t matter.

Sometimes it’s tricky to deduce exactly which of these three situations apply. You have to think carefully about the problem and ask yourself whether repeated values would be allowed and whether it matters what order the values appear in. This is often subtle.

As an example, suppose my friend and I work out at the same gym. This gym has 18 different weight machines to choose from, each of which exercises a different muscle group. Each morning,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 171

Context: # 6.5. EXERCISES

1. To encourage rivalry and gluttony, we're going to give a special certificate to the child who collects the most candy at the end of the night. And while we're at it, we'll give 2nd-place and 3rd-place certificates as well. How many different ways could our 1st-2nd-3rd contest turn out?  
   **Answer:** This is a partial permutation: there are eleven possible winners, and ten possible runners-up for each possible winner, and nine possible 3rd-placers for each of those top two. The answer is therefore 114, or 900. Wow! I wouldn’t have guessed that high.

2. Finally, what if we want every kid to get a certificate with their name and place-of-finish on it? How many possibilities? (Assume no ties.)  
   **Answer:** This is now a full-blown permutation: 111. It comes to 30,916,800 different orders-of-finish, believe it or not. I told you: this counting stuff can explode fast.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 175

Context: # 7.1. WHAT IS A “NUMBER?”

When you think of a number, I want you to try to erase the sequence of digits from your mind. Think of a number as what is: a **quantity**. Here's what the number seventeen really looks like:

```
   8
 8 8 8
   8
```

It’s just an **amount**. There are more circles in that picture than in some pictures, and less than in others. But in no way is it “two digits,” nor do the particular digits “1” and “7” come into play any more or less than any other digits.

Let’s keep thinking about this. Consider this number, which I’ll label “A”:

(A)

Now let’s add another circle to it, creating a different number I’ll call “B”:

(B)

And finally, we’ll do it one more time to get “C”:

(C)

(Look carefully at those images and convince yourself that I added one circle each time.)

When going from A to B, I added one circle. When going from B to C, I also added one circle. Now I ask you: was going from B to C any more “significant” than going from A to B? Did anything qualitatively different happen?

The answer is obviously no. Adding a circle is adding a circle; there’s nothing more to it than that. But if you had been writing
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 182

Context: # CHAPTER 7. NUMBERS

Another observation is that the value of \( n \mod k \) always gives a value between \( 0 \) and \( k - 1 \). We may not know at a glance what \( 407,332,117 \mod 3 \) is, but we know it can't be \( 12 \), or \( 4 \), or even \( 3 \), because if we had that many elements left after taking out groups of \( 3 \), we could still take out another group of \( 3 \). The remainder only gives us what's left after taking out groups, so by definition there cannot be an entire group (or more) left in the remainder.

The other operation we need is simply a "round down" operation, traditionally called "floor" and written with brackets: \( \lfloor x \rfloor \). The floor of an integer is itself. The floor of a non-integer is the integer just below it. So \( \lfloor 7 \rfloor = 7 \) and \( \lfloor 4.81 \rfloor = 4 \). It's that simple.

The reason we use the floor operator is just to get the whole number of times one number goes into another. \( [13 \div 3] = 4 \), for example. By using mod and floor, we get the quotient and remainder of a division, both integers. If our numbers are \( 25 \) and \( 7 \), we have \( [25 \div 7] = 3 \) and \( 25 \mod 7 = 4 \). Notice that this is equivalent to saying that \( 25 = 3 \times 7 + 4 \). We're asking "how many groups of \( 7 \) are in \( 25 \)?" and the answer is that \( 25 \) is equal to \( 3 \) groups of \( 7 \), plus \( 4 \) extra.

The general procedure for converting from one base to another is to repeatedly use mod and floor to strip out the digits from right to left. Here's how you do it:

## Express a numeric value in a base

1. Take the number mod the base. Write that digit down.

2. Divide the number by the base and take the floor:

   a) If you get zero, you're done.
   
   b) If you get non-zero, then make this non-zero number your new value, move your pencil to the left of the digit(s) you've already written down, and return to step 1.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 199

Context: # 7.4. BINARY (BASE 2)

But then again, if I say, “oh, that’s a two’s-complement number,” you’d first look at the leftmost bit, see that it’s a 1, and realize you’re dealing with a negative number. What is the negative of? You’d flip all the bits and add one to find out. This would give you `00110110`, which you’d interpret as a base 2 number and get \(60_{10}\). You’d then respond, “ah, then that’s the number \(-60_{10}\).” And you’d be right.

So what does `11000010` represent then? Is it 196, -68, or -60? The answer is **any of the three**, depending on what representation scheme you’re using. None of the data in computers or information systems has intrinsic meaning: it all has to be interpreted according to the syntactic and semantic rules that we invent. In math and computer science, anything can be made to mean anything; after all, we invent the rules.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 201

Context: # 7.5 EXERCISES

6. If I told you that 98,243,917,215 mod 7 was equal to 1, would you call me a liar without even having to think too hard?  
   No, you shouldn’t. It turns out that the answer is 3, not 1, but how would you know that without working hard for it?  

7. If I told you that 273,111,999,214 mod 6 was equal to 6, would you call me a liar without even having to think too hard?  
   Yes, you should. Any number mod 6 will be in the range 0 through 5, never 6 or above. (Think in terms of repeatedly taking out groups of six from the big number. The mod is the number of stones you have left when there are no more whole groups of six to take. If towards the end of this process there are six stones left, that’s not a remainder because you can get another whole group!)

8. Are the numbers 18 and 25 equal?  
   Of course not. Don’t waste my time.

9. Are the numbers 18 and 25 congruent mod 7?  
   Yes. If we take groups of 7 out of 18 stones, we’ll get two such groups (a total of 14 stones) and have 4 left over. And then, if we do that same with 25 stones, we’ll get three such groups (a total of 21 stones) and again have 4 left over. So they’re not congruent mod 7.

10. Are the numbers 18 and 25 congruent mod 6?  
    No. If we take groups of 6 out of 18 stones, we’ll get three such groups with nothing left over. But if we start with 25 stones, we’ll take out 4 such groups (for a total of 24 stones) and have one left over. So they’re not congruent mod 6.

11. Are the numbers 617,418 and 617,424 equal?  
    Of course not. Don’t waste my time.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 206

Context: # CHAPTER 8. LOGIC

A statement that has a "truth value," which means that it is either true or false. The statement "all plants are living beings" could be a proposition, as could "Barack Obama was the first African-American President" and "Kim Kardashian will play the title role in Thor: Love and Thunder." By contrast, questions like "are you okay?" cannot be propositions, nor can commands like "hurry up and answer already!" or phrases like "Lynn's newborn schnauzer," because they are not statements that can be true or false. (Linguistically speaking, propositions have to be in the indicative mood.)

We normally use capital letters (what else?) to denote propositions, like:

- Let **A** be the proposition that UMW is in Virginia.
- Let **B** be the proposition that the King of England is female.
- Let **C** be the proposition that dogs are carnivores.

Don't forget that a proposition doesn't have to be true in order to be a valid proposition (B is still a proposition, for example). It just matters that it is labeled and that it has the potential to be true or false.

Propositions are considered atomic. This means that they are indivisible: to the logic system itself, or to a computer program, they are simply an opaque chunk of truth (or falsity) called "A" or whatever. When we humans read the description of A, we realize that it has to do with the location of a particular institution of higher education, and with the state of the union that it might reside (or not reside) in. All this is invisible to an artificially intelligent agent, however, which treats "A" as nothing more than a stand-in label for a statement that has no further discernible structure. 

So things are pretty boring so far. We can define and label propositions, but none of them have any connections to the others. We change that by introducing logical operators (also called logical connectives) with which we can build up compound constructions out of multiple propositions. The six connectives we’ll learn are:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 215

Context: # 8.1. Propositional Logic

| X | Y | Z | ¬Z | X∨Y | ¬X∨Y | ¬Z∨Z | X∧Z | (¬X∧Z) | (X∧¬Z) |
|---|---|---|----|-----|-------|-------|------|---------|--------|
| 0 | 0 | 0 | 1  | 0   | 1     | 1     | 0    | 0       | 0      |
| 0 | 1 | 0 | 1  | 1   | 0     | 1     | 0    | 0       | 0      |
| 1 | 0 | 0 | 1  | 1   | 1     | 1     | 0    | 0       | 1      |
| 1 | 1 | 0 | 1  | 1   | 1     | 1     | 1    | 1       | 0      |
| 1 | 0 | 1 | 0  | 1   | 1     | 0     | 1    | 0       | 0      |
| 1 | 1 | 1 | 0  | 1   | 1     | 0     | 1    | 0       | 1      |

*Here, **♢** stands for ¬(X∧Y)∧(X∧Z)*  
*Here, **☐** stands for ¬(X∧Y)∨(X∧¬Z)*

---

(If you’re looking for some practice, cranking through this example on your own and then comparing your answers to the above truth table isn’t a bad idea at all.)

You’ll notice that the “answer” column has all 1’s. This means that the expression is always true, no matter what the values of the individual propositions are. Such an expression is called a tautology: it’s always true. The word “tautology” has a negative connotation in regular English usage; it refers to a statement so obvious as to not tell you anything, like “all triangles have three sides,” or “the fatal overdose was deadly.” But in logic, tautologies are quite useful, since they represent reliable identities.

The tautology above was a contrived example, and not useful in practice. Here are some important others, though:

| X | ¬X | X∨¬X |
|---|----|------|
| 0 | 1  | 1    |
| 1 | 0  | 1    |

Sometimes called the **law of the excluded middle**, this identity states that either a proposition or its negative will always be true. (There is no third option.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 217

Context: # 8.2 Predicate logic

The `∧` operator is analogous to `+` (times), while `∨` corresponds to `+` (plus). In fact, if you look at the truth tables for these two operators again, you’ll see an uncanny resemblance:

| X | Y | X ∧ Y | X ∨ Y |
|---|---|-------|-------|
| 0 | 0 |   0   |   0   |
| 0 | 1 |   0   |   1   |
| 1 | 0 |   0   |   1   |
| 1 | 1 |   1   |   1   |

Except for the `(1)` that I put in parentheses, this truth table is exactly what you’d get if you mathematically multiplied (`∧`) and added (`∨`) the inputs! At some level, logically “and-ing” is multiplying, while “or-ing” is adding. Fascinating.

## 8.2 Predicate logic

Propositional logic can represent a lot of things, but it turns out to be too limiting to be practically useful. That has to do with the atomic nature of propositions. Every proposition is its own opaque chunk of truthhood or falsity, with no way to break it down into constituent parts. 

Suppose I wanted to claim that every state in the union had a governor. To state this in propositional logic, I’d have to create a brand new proposition for each state:

- Let G1 be the proposition that Alabama has a governor.
- Let G2 be the proposition that Alaska has a governor.
- Let G3 be the proposition that Arizona has a governor.
- ...

and then, finally, I could assert:

G1 ∧ G2 ∧ G3 ∧ ... ∧ G50.

That’s a lot of work just to create a whole bunch of individual propositions that are essentially the same. What we need is some
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 219

Context: # 8.2. Predicate Logic

which is perfectly true[^1].

You may recall the word "predicate" from your middle school grammar class. Every sentence, remember, has a subject and a predicate. In “Billy jumps,” “Billy” is the subject, and “jumps” the predicate. In “The lonely boy ate spaghetti with gusto,” we have “the lonely boy” as the subject and “ate spaghetti with gusto” as the predicate. Basically, a predicate is anything that can describe or affirm something about a subject. Imagine asserting “JUMPS(Billy)” and “ATESPAGHETTIWITHGUSTO(lonely boy).”

A predicate can have more than one input. Suppose we define the predicate `IsFanOf` as follows:

Let `IsFanOf(x, y)` be the proposition that x digs the music of rock band y.

Then I can assert:

- `IsFanOf(Stephen, Led Zeppelin)`
- `IsFanOf(Rachel, The Beatles)`
- `IsFanOf(Stephen, The Beatles)`
- `¬IsFanOf(Stephen, The Rolling Stones)`

We could even define `TraveledToByModeInYear` with a bunch of inputs:

Let `TraveledToByModeInYear(p, d, m, y)` be the proposition that person p traveled to destination d by mode m in year y.

The following statements are then true:

- `TraveledToByModeInYear(Stephen, Richmond, car, 2017)`

> "By the way, when I say you can give any input at all to a predicate, I mean any individual element from the domain of discourse. I don’t mean that a set of elements can be an input. This limitation is why it’s called ‘first-order’ predicate logic. If you allow sets to be inputs to predicates, it’s called ‘second-order predicate logic’, and can get quite messy." 

[^1]: The footnote text if needed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 222

Context: 214  
# CHAPTER 8. LOGIC

So these two cases both result in true. But perhaps surprisingly, we also get true for oatmeal:

- **Human**(oatmeal) → **Adult**(oatmeal) ⊕ **Child**(oatmeal)  
  false → false ⊕ false  
  false → false  
  true ✓  

Whoa, how did true pop out of that? Simply because the premise was false, and so all bets were off. We effectively said “if a bowl of oatmeal is human, then it will either be an adult or a child. But it’s not, so never mind!” Put another way, the bowl of oatmeal did not turn out to be a counterexample, and so we’re confident claiming that this expression is true “for all \( x \): \( \forall \)”.

The other kind of quantifier is called the **existential quantifier**. As its name suggests, it asserts the existence of something. We write it \( \exists \) and pronounce it “there exists.” For example,

- \( \exists \, \text{HasGovernor}(x) \) asserts that there is at least one state that has a governor. This doesn’t tell us how many states this is true for, and in fact despite their name, quantifiers really aren’t very good at “quantifying” things for us, at least numerically. As of 2008, the statement

- \( \text{President}(x) \land \text{African-American}(x) \)

is true, and always will be, no matter how many more African-American U.S. presidents we have. Note that in compound expressions like this, a variable (like \( x \)) always stands for a single entity wherever it appears. For hundreds of years there existed African-Americans, and there have existed Presidents, so the expression above would be ridiculously obvious if it meant only “there have been Presidents, and there have been African-Americans.” But the same variable \( x \) being used as inputs to both predicates is what seals the deal and makes it represent the much stronger statement: “there is at least one individual who is personally both African-American and President of the United States at the same time.”
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 224

Context: # CHAPTER 8. LOGIC

## Order matters

When you're facing an intimidating morass of \( \forall \)s and \( \exists \)s and \( \vee \)s and \( \wedge \)s, and God knows what else, it's easy to get lost in the sauce. But you have to be very careful to dissect the expression to find out what it means. Consider this one:

\[
\forall x \in \mathbb{R}, \exists y \in \mathbb{R} \quad y = x + 1.
\] (8.5)

This statement is true. It says that for every single real number (call it \( x \)), it's true that you can find some other number (call it \( y \)) that's one greater than it. If you generate some examples it's easy to see this is true. Suppose we have the real number \( x = 5 \). Is there some other number \( y \) that's equal to \( 2 + x \)? Of course, the number 6. What if \( x = -32.4 \)? Is there a number \( y \) that satisfies this equation? Of course, \( y = -31.4 \). Obviously not matter what number \( x \) we choose, we can find the desired number \( y \) just by adding one. Hence this statement is true for all \( x \), just like it says.

What happens, though, if we innocently switch the order of the quantifiers? Let's try asserting this:

\[
\exists y \in \mathbb{R}, \forall x \in \mathbb{R} \quad y = x + 1.
\] (8.6)

Is this also true? Look carefully. It says “there exists some magic number \( y \) that has the following amazing property: no matter what value of \( x \) you choose, this \( y \) is one greater than \( x \)!” Obviously this is not true. There is no such number \( y \). If I choose \( y = 13 \), that works great as long as I choose \( x = 12 \), but for any other choice of \( x \), it’s dead in the water.

## The value of precision

This fluency with the basic syntax and meaning of predicate logic was our only goal in this chapter. There are all kinds of logical rules that can be applied to predicate logic statements in order to
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 226

Context: # Chapter 8: Logic

Here's a funny one I’ll end with. Consider the sentence **“He made her duck.”** What is intended here? Did some guy reach out with his hand and forcefully push a woman’s head down out of the way of a screaming projectile? Or did he prepare a succulent dish of roasted fowl to celebrate her birthday? Oh, if the computer could only know. If we’d used predicate logic instead of English, it could!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 230

Context: # CHAPTER 8. LOGIC

|   | Statement                                                                                              |
|---|-------------------------------------------------------------------------------------------------------|
| 35| True or false: ∀x Professor(x) ⇔ Human(x). <br> **True!** This is what we were trying to say all along. Every professor is a person. |
| 36| True or false: ¬∃x Professor(x) → ¬Human(x). <br> **True!** This is an equivalent statement to item 35. There's nothing in the universe that is a professor yet not a human. (At least, at the time of this writing.) |
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 231

Context: # Chapter 9

## Proof

We've seen a lot of pretty sights on our cool brisk walk. We've caught a glimpse of the simple elegance of sets and relations, the precision of probabilistic reasoning, the recursive structure of trees, the explosive nature of combinatorics, and much more. None of these things have we plumbed to the depths, but we've appreciated their beauty and taken note of where they stood along our blazed trail. You’ll remember this hike when you run into such concepts again and again in future computer science and math courses, and in your career beyond academics.

Now we have one more step to make before returning to the trailhead, and that deals with the notion of **proof**. As we've studied these various mathematical entities, I’ve pointed out certain of their properties. A free tree has one more vertex than edge, for example. The cardinality of the union of two sets is at least as big as each of their individual unions. If you flip-all-the-bits-and-add-one in a two's complement scheme, and then perform that flip-and-add operation again, you’ll return to the original number. But with a few exceptions, we haven't proven any of these things. I’ve just stated them, and you’ve taken them on faith.

In order to establish reliable truth, of course, professional mathematicians aren’t satisfied with unsubstantiated statements. They need to be convinced that the claims we make do truly hold, and provably so, in all circumstances. What they seek is a **proof** of a...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 233

Context: # 9.1. PROOF CONCEPTS

The phone is in my pocket, and has not rung, and I conclude that the plan has not changed. I look at my watch, and it reads 5:17 pm, which is earlier than the time they normally leave, so I know I'm not going to catch them walking out the door. This is, roughly speaking, my thought process that justifies the conclusion that the house will be empty when I pull into the garage.

Notice, however, that this prediction depends precariously on several facts. What if I spaced out the day of the week, and this is actually Thursday? All bets are off. What if my cell phone battery has run out of charge? Then perhaps they did try to call me but couldn’t reach me. What if I set my watch wrong and it’s actually 4:17 pm? Etc. Just like a chain is only as strong as its weakest link, a whole proof falls apart if even one step isn’t reliable.

Knowledge bases in artificial intelligence systems are designed to support these chains of reasoning. They contain statements expressed in formal logic that can be examined to deduce only the new facts that logically follow from the old. Suppose, for instance, that we had a knowledge base that currently contained the following facts:

1. \( A = C \)
2. \( \neg (C \land D) \)
3. \( (F \lor E) \to D \)
4. \( A \lor B \)

These facts are stated in propositional logic, and we have no idea what any of the propositions really mean, but then neither does the computer, so hey. Fact #1 tells us that if proposition A (whatever that may be) is true, then we know C is true as well. Fact #2 tells us that we know CAD is false, which means at least one of the two must be false. And so on. Large knowledge bases can contain thousands or even millions of such expressions. It’s a complete record of everything the system “knows.”

Now suppose we learn an additional fact: \( \neg B \). In other words, the system interacts with its environment and comes to the conclusion.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 235

Context: ```
9.2 TYPES OF PROOF
==================

to be true, and so it is legal grounds from which to start. A proof can't even get off the ground without axioms. For instance, in step 1 of the above proof, we noted that either A or B must be true, and so if B isn't true, then A must be. But we couldn't have taken this step without knowing that disjunctive syllogism is a valid form of reasoning. It's not important to know all the technical names of the rules that I included in parentheses. But it is important to see that we made use of an axiom of reasoning on every step, and that if any of those axioms were incorrect, it could lead to a faulty conclusion.

When you create a valid proof, the result is a new bit of knowledge called a theorem which can be used in future proofs. Think of a theorem like a subroutine in programming: a separate bit of code that does a job and can be invoked at will in the course of doing other things. One theorem we learned in chapter 2 was the distributive property of sets; that is, \( X \cap (Y \cup Z) = (X \cap Y) \cup (X \cap Z) \). This can be proven through the use of Venn diagrams, but once you've proven it, it's accepted to be true, and can be used as a "given" in future proofs.

## 9.2 Types of Proof

There are a number of accepted "styles" of doing proofs. Here are some important ones:

### Direct Proof

The examples we've used up to now have been direct proofs. This is where you start from what's known and proceed directly by positive steps towards your conclusion.

Direct proofs remind me of a game called "word ladders," invented by Lewis Carroll, that you might have played as a child:

```
WARM
  |
  |
? ? ? ?
  |
  |
COLD
```
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 239

Context: # 9.3 Proof by Induction

sides of the equation (a perfectly legal thing to do):

\[
\left(\frac{a}{b}\right)^2 = \sqrt{2}
\]

\[
\frac{a^2}{b^2} = (\sqrt{2})^2
\]

\[
a^2 = 2
\]
\[
a^2 = 2b^2.
\]

Now if \(a^2\) equals 2 times something, then \(a^2\) is an even number. But \(a^2\) can’t be even unless itself is even. (Think hard about that one.) This proves, then, that \(a\) is even. Very well. It must be equal to twice some other integer. Let’s call that \(c\). We know that \(a = 2c\), where \(c\) is another integer. Substitute that into the last equation and we get:

\[
(2c)^2 = 2b^2
\]

\[
4c^2 = 2b^2
\]

\[
2c^2 = b^2.
\]

So it looks like \(b^2\) must be an even number as well (since it’s equal to 2 times something), and therefore \(b\) is also even. But wait a minute. We started by saying that \(a\) and \(b\) had no common factor. And now we’ve determined that they’re both even numbers! This means they both have a factor of 2, which contradicts what we started with. The only thing we introduced that was questionable was the notion that there are two integers \(a\) and \(b\) whose ratio was equal to \(\sqrt{2}\) to begin with. That must be the part that’s faulty then. Therefore, \(\sqrt{2}\) is not an irrational number. Q.E.D.

## 9.3 Proof by Induction

One of the most powerful methods of proof — and one of the most difficult to wrap your head around — is called mathematical induction, or just “induction” for short. I like to call it “proof by induction.”
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 240

Context: 232  
# CHAPTER 9. PROOF  

“recursion,” because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.116). A tree can be thought of as a node with several children—each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.16 on p.113, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs.

Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron.

## Casting the problem in the right form  

Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.)

Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let VOTE(n) be the proposition that a citizen of age n can vote.”

If I want to prove an algebraic identity, like  

\[
\sum_{i=1}^{n} \frac{i}{n+1} = \frac{n(n+1)}{2}, 
\]  

then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \( n \) variable in my equation.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 244

Context: # Example 2

A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher's astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \( 50 \times 101 = 5050 \). In general, if you add the numbers from 1 to \( x \), where \( x \) is any integer at all, you'll get \( \frac{x(x + 1)}{2} \).

Now, use mathematical induction to prove that Gauss was right (i.e., that \( \sum_{i=1}^{x} i = \frac{x(x + 1)}{2} \)) for all numbers \( x \).

First, we have to cast our problem as a predicate about natural numbers. This is easy: say “let \( P(n) \) be the proposition that 

\[
\sum_{i=1}^{n} i = \frac{n(n + 1)}{2} 
\]

Then, we satisfy the requirements of induction:

1. **Base Case**: We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have:

   \[
   \sum_{i=1}^{1} i = \frac{1(1 + 1)}{2}
   \]
   \[
   1 = \frac{1(2)}{2}
   \]
   \[
   1 = 1 \quad \text{✓}
   \]

2. **Inductive Step**: We now must prove that \( P(k) \) implies \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 245

Context: # 9.3. PROOF BY INDUCTION

Let’s be crystal clear where we’re going with this. Assuming that \( P(k) \) is true means we can count on the fact that 

\[
1 + 2 + 3 + \cdots + k = \frac{k(k + 1)}{2}
\]

What we need to do, then, is prove that \( P(k + 1) \) is true, which amounts to proving that 

\[
1 + 2 + 3 + \cdots + (k + 1) = \frac{(k + 1)(k + 2)}{2}
\]

Very well. First we make the inductive hypothesis, which allows us to assume:

\[
1 + 2 + 3 + \cdots + k = \frac{k(k + 1)}{2}
\]

The rest is just algebra. We add \( k + 1 \) to both sides of the equation, then multiply things out and factor it all together. Watch carefully:

\[
1 + 2 + 3 + \cdots + k + (k + 1) = \frac{k(k + 1)}{2} + (k + 1)
\]

\[
= \frac{k^2}{2} + \frac{1}{2}(k + 1) + (k + 1)
\]

\[
= \frac{k^2}{2} + \frac{3}{2}(k + 1)
\]

\[
= \frac{k^2 + 3k + 2}{2}
\]

\[
= \frac{(k + 1)(k + 2)}{2}
\]

Therefore, \( \forall n \geq 1 \, P(n) \).

## Example 3

Another algebra one. You learned in middle school that \( (ab)^n = a^n b^n \). Prove this by mathematical induction.

**Solution:** Let \( P(n) \) be the proposition that \( (ab)^n = a^n b^n \).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 251

Context: # 9.3. PROOF BY INDUCTION

## 2. Inductive Step
We now must prove that \((\forall i \leq k) P(i) \Rightarrow P(k + 1)\). Put another way, we assume that all trees smaller than the one we’re looking at have one more node than edge, and then use that assumption to prove that the tree we’re looking at also has one more node than edge.

We proceed as follows. Take any free tree with \(k + 1\) nodes. Removing any edge gives you two free trees, each with \(k\) nodes or less. (Why? Well, if you remove any edge from a free tree, the nodes will no longer be connected, since a free tree is “minimally connected” as it is. And we can't break it into more than two trees by removing a single edge, since the edge connects exactly two nodes and each group of nodes on the other side of the removed edge are still connected to each other.)

Now the sum of the nodes in these two smaller trees is still \(k + 1\). (This is because we haven't removed any nodes from the original free tree — we've simply removed an edge.) If we let \(k_1\) be the number of nodes in the first tree, and \(k_2\) the number of nodes in the second, we have \(k_1 + k_2 = k + 1\).

Okay, but how many edges does the first tree have? Answer: \(k_1 - 1\). How do we know that? By the inductive hypothesis. We’re assuming that any tree smaller than \(k + 1\) nodes has one less edge than node, and so we’re taking advantage of that (legal) assumption here. Similarly, the second tree has \(k_2 - 1\) edges.

Bingo. Removing one edge from our original tree of \(k + 1\) nodes gave us a total of \(k - 1\) edges. Therefore, that original tree must have had \(k\) edges. We have now proven that a tree of \(k + 1\) nodes has \(k\) edges, assuming that all smaller trees also have one less edge than node.

## 3. Conclusion
Therefore, by the strong form of mathematical induction, \(\forall n \geq 1, P(n)\).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 252

Context: # 9.4 Final Word

Finding proofs is an art. In some ways, it’s like programming: you have a set of building blocks, each one defined very precisely, and your goal is to figure out how to assemble those blocks into a structure that starts with only axioms and ends with your conclusion. It takes skill, patience, practice, and sometimes a little bit of luck.

Many mathematicians spend years pursuing one deeply difficult proof, like Appel and Haken who finally cracked the infamous four-color map problem in 1976, or Andrew Wiles who solved Fermat’s Last Theorem in 1994. Some famous mathematical properties may never have proofs, such as Christian Goldbach’s 1742 conjecture that every even integer is the sum of two primes; or the most elusive and important question in computing theory: does P=NP? (Put very simply: if you consider the class of problems where it’s easy to verify a solution once you have it, but crazy hard to find it in the first place, is there actually an easy algorithm for finding the solution that we just haven’t figured out yet?) Most computer scientists think “no,” but despite a mind-boggling number of hours invested by the brightest minds in the world, no one has ever been able to prove it one way or the other.

Most practicing computer scientists spend time taking advantage of the known results about mathematical objects and structures, and rarely (if ever) have to construct a watertight proof about them. For the more theoretically-minded student, however, who enjoys probing the basis behind the tools and speculating about additional properties that might exist, devising proofs is an essential skill that can also be very rewarding.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 253

Context: # Index

- n-choose-k notation, 156
- n-to-the-k-falling operator, 152
- a priori, 68
- modus ponens, 201, 226
- modus tollens, 226
- quad erat demonstrandum (Q.E.D.), 226
- reductio ad absurdum, 229
- acyclic (graphs), 91
- additivity property, 63
- adjacent (vertices), 89
- algorithm, 97, 127, 128, 132, 149, 150
- Ali, Muhammad, 92
- American Idol, 62, 68
- ancestor (of a node), 115
- and (logical operator), 18, 199, 203
- antisymmetric (relation), 40, 43
- Appel, Kenneth, 244
- arrays, 13
- artificial intelligence (AI), 197, 201, 225
- associative, 20
- asymmetric (relation), 41
- ATM machines, 143
- atomic (propositions), 198
- AVL trees, 133
- axioms, 226, 229
- background knowledge, 68, 70
- balancedness (of a tree), 132
- base case (of a proof), 233, 240
- bases (of number systems), 166, 168, 170
- Bayes' Theorem, 75
- Bayes, Thomas, 67
- Bayesian, 66
- BFT (breadth-first traversal), 95, 97
- Big-O notation, 127
- bijective (function), 49
- binary numbers, 25, 177, 178, 180, 182
- binary search trees, 123, 125
- binomial coefficients, 156
- bit, 177
- Booth, John Wilkes, 86
- BST property, 125, 131
- byte, 180
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 254

Context: ```
# CHAPTER 9. PROOF

- Cantor, Georg, 7, 12, 17
- capacity (of a byte), 182
- cardinality (of sets), 16, 25, 28, 66
- Carroll, Lewis, 227
- carry-in, 189
- carry-out, 189
- Cartesian product (of sets), 
  - 19, 35
- chess, 114
- child (of a node), 115
- closed interval, 61
- codomain (of a function), 45
- collectively exhaustive, 26
- combinations, 154
- combinators, 25, 141
- commutative, 18, 20, 71
- compilers, 114
- complement laws (of sets), 
  - 21
- complement, partial (of sets), 
  - 18
- complement, total (of sets), 
  - 18, 65, 146, 162
- complete binary tree, 121
- conclusion (of implication), 200
- conditional probability, 
  - 68, 72, 74, 78
- congruent, 173
- conjunction, 199, 208
- connected (vertices/graphs), 
  - 89, 95
- coordinates, 15
- curly brace notation, 11
- cycles, 90

## DAGs (directed acyclic graphs), 90

- data structures, 85
- Davies family, 8, 9, 26, 147, 154
- De Morgan's laws, 21, 22, 207, 208
- decimal numbers, 165, 169, 173, 178
- degree (of a vertex), 90
- depth (of a node), 115
- dequeuing, 95
- descendant (of a node), 115
- DFT (depth-first traversal), 99, 101

## Dijkstra's algorithm, 101, 104
- Dijkstra, Edsger, 101
- direct proof, 227
- directed graphs, 88, 91
- disjunction, 199, 208, 226
- disjunctive syllogism, 226
- disk sectors, 156
- distributive, 20, 208, 227
- domain (of a function), 45
- domain of discourse (?), 9, 
  - 19, 21, 27, 60, 210
- domination laws (of sets), 21
- dominors, 234
- drinking age, 232, 234
- duplicates (in sets), 13
- edges, 86, 87
- elements (of sets), 8, 15, 23
- ellipses, 12
- empty graph, 87
- empty set, 9, 16, 21, 24, 25, 
  - 36, 114
- endorelations, 38, 93
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 257

Context: 9.4. FINAL WORD
=================

- ordered triples, 15
- org charts, 113
- outcomes, 60, 62
- overflow, 188

P = NP?, 244  
parent (of a node), 114  
partial orders, 43  
partial permutations, 151, 154  
partitions, 26, 71, 94  
Pascal's triangle, 157  
passwords, 146  
paths (in a graph), 88, 113  
perfect binary tree, 122, 239  
permutations, 147  
PINs, 143  
poker, 160  
pop (off a stack), 99  
posts, 43  
post-order traversal, 118  
postulates, 226  
power sets, 24, 36  
pre-order traversal, 117  
predicates, 210, 211, 232  
predicate logic, 210  
premise (of implication), 200  
Prim's algorithm, 107  
Prime, Robert, 107  
prior probability, 68  
probability measures, 61, 63, 65  
product operator (II), 142, 160  
proof, 223  
proof by contradiction, 229  
propositional logic, 197, 225  
propositions, 197, 210  

- psychology, 70, 86
- push (on a stack), 99
- quantifiers, 212, 215
- queue, 95, 97
- quotient, 173, 174

- range (of a function), 48
- rational numbers (ℝ), 17, 24
- reachable, 89
- real numbers (ℝ), 71, 94
- rebalancing (a tree), 132
- recursion, 116, 120, 149, 231
- red-black trees, 133
- reflexive (relation), 40, 43
- relations, finite, 39
- relations, infinite, 39
- remainder, 173, 174
- right child, 116  
- root (of a tree), 112, 114  
- rooted trees, 112, 134  
- Russell's paradox, 15  

- sample space (Ω), 60  
- semantic network, 87  
- set operators, 18  
- set-builder notation, 11  
- sets, 8, 93  
- sets of sets, 15  
- sets, finite, 12  
- sets, fuzzy, 10  
- sets, infinite, 12  
- sibling (of a node), 115  
- signs-magnitude binary numbers, 183, 189  

**Sonic the Hedgehog**, 73  
- southern states, 72  
- spatial positioning, 92, 113
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 3

Context: # Learning Theory

## Table of Contents

3.4 Bayesian Learning ......................................... 63  
3.5 Supervised Learning via Generalized Linear Models (GLM) ... 69  
3.6 Maximum Entropy Property* ................................. 71  
3.7 Energy-based Models* ...................................... 72  
3.8 Some Advanced Topics* ..................................... 73  
3.9 Summary ................................................... 74  

## II Supervised Learning

### 75  

4 Classification ................................................ 76  
4.1 Preliminaries: Stochastic Gradient Descent ............ 77  
4.2 Classification as a Supervised Learning Problem ...... 78  
4.3 Discriminative Deterministic Models ..................... 80  
4.4 Discriminative Probabilistic Models: Generalized Linear Models ............................................... 90  
4.5 Discriminative Probabilistic Models: Beyond GLM ..... 96  
4.6 Generative Probabilistic Models .......................... 102  
4.7 Boosting .................................................... 106  
4.8 Summary ................................................... 107  

## 5 Statistical Learning Theory*

### 113  

5.1 A Formal Framework for Supervised Learning .......... 114  
5.2 PAC Learnability and Sample Complexity ............... 119  
5.3 PAC Learnability for Finite Hypothesis Classes ....... 120  
5.4 VC Dimension and Fundamental Theorem of PAC Learning .................................................... 124  
5.5 Summary ................................................... 126  

## III Unsupervised Learning

### 129  

6 Unsupervised Learning ....................................... 130  
6.1 Unsupervised Learning ................................... 131  
6.2 K-Means Clustering ...................................... 134  
6.3 ML, ELBO and EM ......................................... 136  
6.4 Directed Generative Models ............................. 148  
6.5 Undirected Generative Models ............................ 155
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 8

Context: 2. Notation
==============

- The notation log represents the logarithm in base two, while ln represents the natural logarithm.
- \( \mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \) indicates that random vector \(\mathbf{x}\) is distributed according to a multivariate Gaussian distribution with mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\Sigma\). The multivariate Gaussian pdf is denoted as \( \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \Sigma) \) as a function of \(\mathbf{x}\).
- \( \mathbf{x} \sim U[a,b] \) indicates that \(\mathbf{x}\) is distributed according to a uniform distribution in the interval [a, b]. The corresponding uniform pdf is denoted as \( U(a,b) \).
- \( \delta(x) \) denotes the Dirac delta function or the Kronecker delta function, as clear from the context.
- \( \|\mathbf{a}\|_2^2 = \sum_{i=1}^N a_i^2 \) is the quadratic, or \( l_2 \) norm of a vector \(\mathbf{a} = [a_1, \ldots, a_N]^T\). We similarly define the \( l_1 \) norm as \( \|\mathbf{a}\|_1 = \sum_{i=1}^N |a_i| \), and the \( l_0 \) pseudo-norm \( \|\mathbf{a}\|_0 \) as the number of non-zero entries of vector \(\mathbf{a}\).
- \( I \) denotes the identity matrix, whose dimensions will be clear from the context. Similarly, I represents a vector of all ones.
- \( \mathbb{R} \) is the set of real numbers; \( \mathbb{R}^+ \) the set of non-negative real numbers; \( \mathbb{R}^- \) the set of non-positive real numbers; and \( \mathbb{R}^N \) is the set of all vectors of \( N \) real numbers.
- \( |S| \) represents the cardinality of a set \( S \).
- \( \mathbf{Z} \) represents a set of random variables indexed by the integers \( k \in S \).
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 13

Context: # 1.1. What is Machine Learning?

This starts with an in-depth analysis of the problem domain, which culminates with the definition of a mathematical model. The mathematical model is meant to capture the key features of the problem under study and is typically the result of the work of a number of experts. The mathematical model is finally leveraged to derive hand-crafted solutions to the problem.

For instance, consider the problem of defining a chemical process to produce a given molecule. The conventional flow requires chemists to leverage their knowledge of models that predict the outcome of individual chemical reactions, in order to craft a sequence of suitable steps that synthesize the desired molecule. Another example is the design of speech translation or image/video compression algorithms. Both of these tasks involve the definition of models and algorithms by teams of experts, such as linguists, psychologists, and signal processing practitioners, not infrequently during the course of long standardization meetings.

The engineering design flow outlined above may be too costly and inefficient for problems in which faster or less expensive solutions are desirable. The machine learning alternative is to collect large data sets, e.g., of labeled speech, images, or videos, and to use this information to train general-purpose learning machines to carry out the desired task. While the standard engineering flow relies on domain knowledge and on design optimized for the problem at hand, machine learning lets large amounts of data dictate algorithms and solutions. To this end, rather than requiring a precise model of the set-up under study, machine learning requires the specification of an objective, of a model to be trained, and of an optimization technique.

Returning to the first example above, a machine learning approach would proceed by training a general-purpose machine to predict the outcome of known chemical reactions based on a large data set, and then by using the trained algorithm to explore ways to produce more complex molecules. In a similar manner, large data sets of images or videos would be used to train a general-purpose algorithm with the aim of obtaining compressed representations from which the original input can be recovered with some distortion.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 17

Context: # 1.3 Goals and Outline

This monograph considers only passive and offline learning.

## 1.3 Goals and Outline

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. Unlike other texts that are focused on one particular aspect of the field, an effort has been made here to provide a broad but concise overview in which the main ideas and techniques are systematically presented. Specifically, the material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. A prior exposure to information theory is useful but not required.

Detailed discussions are provided on basic concepts and ideas, including overfitting and generalization, Maximum Likelihood and regularization, and Bayesian inference. The text also endeavors to provide intuitive explanations and pointers to advanced topics and research directions. Sections and subsections containing more advanced material that may be regarded at a first reading are marked with a star (+).

The reader will find here further discussions on computing platform or programming frameworks, such as map-reduce, nor details on specific applications involving large data sets. These can be easily found in a vast and growing body of work. Furthermore, rather than providing exhaustive details on the existing myriad solutions in each specific category, techniques have been selected that are useful to illustrate the most salient aspects. Historical notes have also been provided only for a few selected milestone events.

Finally, the monograph attempts to strike a balance between the algorithmic and theoretical viewpoints. In particular, all learning algorithms are presented in a manner that emphasizes their theoretical foundations while also providing practical insights.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 49

Context: # 2.5. Minimum Description Length (MDL)

In a gray scale image, fix some probability mass function (pmf) \( p(x) \) on this alphabet. A key result in information theory states that it is possible to design a lossless compression scheme that uses \( [-\log p(x)] \) bits to represent value \( x \).

By virtue of this result, the choice of a probability distribution \( p(x) \) is akin to the selection of a lossless compression scheme that produces a description of around \( -\log p(x) \) bits to represent value \( x \). Note that the description length \( -\log p(x) \) decreases with the probability assigned by \( p(x) \) to value \( x \): more likely values under \( p(x) \) are assigned a smaller description. Importantly, a decoder would need to know \( p(x) \) in order to recover \( x \) from the bit description.

At an informal level, the MDL criterion prescribes the selection of a model that compresses the training data to the shortest possible description. In other words, the model selected by MDL defines a compression scheme that describes the data set \( D \) with the minimum number of bits. As such, the MDL principle can be thought of as a formalization of Occam’s razor: choose the model that yields the simplest explanation of the data. As we will see below, this criterion naturally leads to a solution that penalizes overfitting.

### What is the length of the description of a data set \( D \) that results from the selection of a specific value of \( M \)?

The answer is not straightforward, since, for a given value of \( M \), there are as many probability distributions as there are values for the corresponding parameters \( \theta \) to choose from. A formal calculation of the description length would hence require the introduction of the concept of universal compression for a given probabilistic model \( \theta \). Here, we will limit ourselves to a particular class of universal codes known as two-part codes.

Using two-part codes, we can compute the description length for the data \( D \) that results from the choice of a model over \( M \) as follows. First, we obtain the ML solution \( (m_{ML}, \hat{\theta}_{ML}) \). Then, we describe the data set by using a compression scheme defined by the probability \( p(x | m_{ML}, \hat{\theta}_{ML}) \). As discussed, this pro-
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 57

Context: # 3  
## Probabilistic Models for Learning  

In the previous chapter, we have introduced the frequentist, Bayesian, and MDL learning frameworks. As we have seen, parametric probabilistic models play a key role for all three of them. The linear regression example considered in the previous chapter was limited to a simple linear Gaussian model, which is insufficient to capture the range of learning problems that are encountered in practice. For instance, scenarios of interest may encompass discrete variables or non-negative quantities.  

In this chapter, we introduce a family of probabilistic models, known as the **exponential family**, whose members are used as components in many of the most common probabilistic models and learning algorithms. The treatment here will be leveraged in the rest of the monograph in order to provide the necessary mathematical background. Throughout this chapter, we will specifically emphasize the common properties of the models in the exponential family, which will prove useful for deriving learning algorithms in the following chapters.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 58

Context: # 3.1 Preliminaries

We start with a brief review of some definitions that will be used throughout the chapter and elsewhere in the monograph (see [28] for more details). Readers with a background in convex analysis and calculus may just review the concept of sufficient statistic in the last paragraph.

First, we define a **convex set** as a subset of \(\mathbb{R}^D\), for some \(D\), that contains all segments between any two points in the set. Geometrically, convex sets hence cannot have “indentations.” Function \(f(x)\) is convex if its domain is a convex set and if it satisfies the inequality \(f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda)f(x_2)\) for all \(x_1\) and \(x_2\) in its domain and for all \(0 \leq \lambda \leq 1\). Geometrically, this condition says that the function is “U”-shaped: the curve defining the function cannot be above the segment obtained by connecting any two points on the curve. A function is strictly convex if the inequality above is strict except for \(\lambda = 0\) or \(\lambda = 1\); a concave, or strictly concave, function is defined by reversing the inequality above – it is hence “n-shaped.”

The minimization of a convex (“U”) function over a convex constraint set or the maximization of a concave (“n”) function over a convex constraint set are known as convex optimization problems. For these problems, there exist powerful analytical and algorithmic tools to obtain globally optimal solutions [28].

We also introduce two useful concepts from calculus. The **gradient** of a differentiable function \(f(x)\) with \(x = [x_1, \ldots, x_D]^T \in \mathbb{R}^D\) is defined as the \(D \times 1\) vector \(\nabla f(x) = [\frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_D}]^T\) containing all partial derivatives. At any point \(x\) in the domain of the function, the gradient is a vector that points to the direction of locally maximal increase of the function. The Hessian \(\nabla^2 f(x)\) is the \(D \times D\) matrix with \((i,j)\) element given by the second-order derivative \(\frac{\partial^2 f(x)}{\partial x_i \partial x_j}\). It captures the local curvature of the function.

1. A statistic is a function of the data.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 64

Context: ```markdown
setting \( C = 2 \) recovers the Bernoulli distribution. PMFs in this model are given as:

\[
\text{Cat}(x| \mu) = \prod_{k=0}^{C-1} \mu_k^{I(x=k)} \times (1 - \sum_{k=1}^{C-1} \mu_k) \quad (3.16)
\]

where we have defined \( \mu_k = \Pr[x = k] \) for \( k = 1, \ldots, C - 1 \) and \( \mu_0 = 1 - \sum_{k=1}^{C-1} \mu_k = \Pr[x = 0] \). The log-likelihood (LL) function is given as:

\[
\ln(\text{Cat}(x|\mu)) = \sum_{k=1}^{C-1} I(x=k) \ln \mu_k + \ln \mu_0 \quad (3.17)
\]

This demonstrates that the categorical model is in the exponential family, with sufficient statistics vector \( u(x) = [I(x = 0) \ldots I(x = C - 1)]^T \) and measure function \( m(x) = 1 \). Furthermore, the mean parameters \( \mu = [\mu_1, \ldots, \mu_{C-1}]^T \) are related to the natural parameter vector \( \eta = [\eta_1, \ldots, \eta_{C-1}]^T \) by the mapping:

\[
\eta_k = \ln \left( \frac{\mu_k}{1 - \sum_{j=1}^{C-1} \mu_j} \right) \quad (3.18)
\]

which again takes the form of an LLR. The inverse mapping is given by:

\[
\mu = \left( \frac{e^{\eta_k}}{1 + \sum_{j=1}^{C-1} e^{\eta_j}} \right) \quad (3.19)
\]

The parameterization given here is minimal, since the sufficient statistics \( u(x) \) are linearly independent. An overcomplete representation would instead include in the vector of sufficient statistics also the function \( I(x=0) \). In this case, the resulting vector of sufficient statistics is:

\[
u(x) = 
\begin{cases} 
1 & \text{if } x = 0 \\ 
1 & \text{if } x = C - 1 
\end{cases} \quad (3.20)
\]

which is known as one-hot encoding of the categorical variable, since only one entry equals 1 while the others are zero. Furthermore, with this...
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 85

Context: # 4.2. Classification as a Supervised Learning Problem

The binary classification problem is illustrated in Fig. 4.1. Given a training set \( D \) of labeled examples \( x_n, n = 1, \ldots, N \), the problem is to assign a new example \( x \) to either class \( C_0 \) or \( C_1 \). In this particular standard data set, the two variables in each vector \( x \) measure the sepal length and sepal width of an iris flower. The latter may belong to either the setosa or virginica family, as encoded by the label \( t_n \) and represented in the figure with different markers. Throughout, we denote as \( D \) the dimension of the domain point \( x \) (\( D = 2 \) in Fig. 4.1).

![Binary classification illustration](path/to/figure4.1.png)

**Figure 4.1:** Illustration of the binary (\( K = 2 \) classes) classification problem with a domain space of dimension \( D = 2 \): to which class should the new example \( x \) be assigned?

Following the taxonomy introduced in Chapter 2, we can distinguish the following modeling approaches, which will be reviewed in the given order throughout the rest of this chapter:

- **Deterministic deterministic models:** Model directly the deterministic mapping between domain point and label via a parameterized function \( t \approx \hat{t}(x) \).
  
- **Discriminative probabilistic models:** Model the probability of a point \( x \) belonging to class \( C_k \) via a parameterized conditional pmf \( p(t|x) \), with the relationship between \( t \) and \( C_k \) defined in (4.4). We will also
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 130

Context: # 5.4 VC Dimension and Fundamental Theorem of PAC Learning

We have seen that finite classes are PAC learnable with sample complexity proportional to the model capacity in \(|H|\) by using ERM. In this section, we address the following questions: Is \(N_{\text{SRM}}(N,\delta)\) the smallest sample complexity? How can we define the capacity of infinite hypothesis classes? We have discussed at the end of Section 5.3 that the answer to the latter question cannot be found by extrapolating from results obtained when considering finite hypothesis classes. In contrast, we will see here that the answers to both of these questions rely on the concept of VC dimension, which serves as a more fundamental definition of capacity of a model. The VC dimension is defined next.

## Definition 5.4

A hypothesis class \(\mathcal{H}\) is said to shatter a set of domain points \(X = \{x_1,\ldots,x_n\}\) if, no matter how the corresponding labels \(\{y_i \in \{0,1\}\}\) are selected, there exists a hypothesis \(h \in \mathcal{H}\) that ensures \(\hat{y}_{n} = y_n\) for all \(n = 1,\ldots,V\).

## Definition 5.5

The VC dimension \(\text{VCdim}(\mathcal{H})\) of the model \(\mathcal{H}\) is the size of the largest set \(X\) that is shattered by \(\mathcal{H}\).

Based on the definitions above, to prove that a model has VCdim(\(\mathcal{H}\)) = V, we need to carry out the following two steps:

1. **Step 1:** Demonstrate the existence of a set \(X\) with \(|X| = V\) that is shattered by \(\mathcal{H}\);
2. **Step 2:** Prove that no set \(X'\) of dimensions \(V + 1\) exists that is shattered by \(\mathcal{H}\).

The second step is typically seen to be more difficult, as illustrated by the following examples.

## Example 5.3

The threshold function model (5.8) has VCdim(\(\mathcal{H}) = 1\), since there is clearly a set \(X\) of one sample (\(V = 1\)) that can be shattered.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 191

Context: When the undirected graph is not a tree, one can use the **junction tree algorithm** for exact Bayesian inference. The idea is to group subsets of variables together in cliques, in such a way that the resulting graph is a tree. The complexity depends on the treewidth of the resulting graph. When this complexity is too high for the given application, approximate inference methods are necessary. This is the subject of the next chapter.

## 7.5 Summary

Probabilistic graphical models encode a priori information about the structure of the data in the form of causality relationships — via directed graphs and Bayesian networks (BNs) — or mutual affinities — via undirected graphs and Markov Random Fields (MRFs). This structure translates into conditional independence conditions. The structural properties encoded by probabilistic graphical models have the potential advantage of controlling the capacity of a model, hence contributing to the reduction of overfitting at the expense of possible bias effects (see Chapter 5). They also facilitate Bayesian inference (Chapters 2-4), at least in graphs with tree-like structures. Probabilistic graphical models can be used as the underlying probabilistic framework for supervised, unsupervised, and semi-supervised learning problems, depending on which subsets of random variables (rvs) are observed or latent.

While graphical models can reduce the complexity of Bayesian inference, this generally remains computationally infeasible for most models of interest. To address this problem, the next chapter discusses approximate Bayesian inference, as well as associated learning problems (Chapter 6).
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 9

Context: # Learning and Intuition

We have all experienced the situation that the solution to a problem presents itself while riding your bike, walking home, “relaxing” in the washroom, waking up in the morning, taking your shower, etc. Importantly, it did not appear while banging your head against the problem in a conscious effort to solve it, staring at the equations on a piece of paper. In fact, I would claim that all my bits and pieces of progress have occurred while taking a break and “relaxing out of the problem.” Greek philosophers walked in circles when thinking about a problem; most of us stare at a computer screen all day. The purpose of this chapter is to make you more aware of where your creative mind is located and to interact with it in a fruitful manner.

My general thesis is that, contrary to popular belief, creative thinking is not performed by conscious thinking. It is rather an interplay between your conscious mind, which prepares the seeds to be planted into the unconscious part of your mind. The unconscious mind will munch on the problem “out of sight” and return promising roads to solutions to the consciousness. This process iterates until the conscious mind decides the problem is sufficiently solved, intractable, or plain dull and moves on to the next. It may be a little unsettling to learn that at least part of your thinking goes on in a part of your mind that seems inaccessible and has a very limited interface with what you think of as yourself. But it is undeniable that it is there, and it is also undeniable that it plays a role in the creative thought process.

To become a creative thinker, one should learn to play this game more effectively. To do so, we should think about the language in which to represent knowledge that is most effective in terms of communication with the unconscious. In other words, what type of “interface” between conscious and unconscious mind should we use? It is probably not a good idea to memorize all the details of a complicated equation or problem. Instead, we should extract the abstract idea and capture the essence of it in a picture. This could be a movie with colors and other elements.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 15

Context: # 1.1. DATA REPRESENTATION

Most datasets can be represented as a matrix, \( X = [X_{n,k}] \), with rows indexed by "attribute-index" \( i \) and columns indexed by "data-index" \( n \). The value \( X_{n,k} \) for attribute \( i \) and data-case \( n \) can be binary, real, discrete, etc., depending on what we measure. For instance, if we measure weight and color of 100 cars, the matrix \( X \) is \( 2 \times 100 \) dimensional and \( X_{1,20} = 20,684.57 \) is the weight of car nr. 20 in some units (a real value) while \( X_{2,20} = 2 \) is the color of car nr. 20 (say one of 6 predefined colors).

Most datasets can be cast in this form (but not all). For documents, we can give each distinct word of a prespecified vocabulary a number and simply count how often a word was present. Say the word "book" is defined to have nr. 10, 568 in the vocabulary then \( X_{1,1005} = 4 \) would mean: the word book appeared 4 times in document 5076. Sometimes the different data-cases do not have the same number of attributes. Consider searching the internet for images about cats. You’ll retrieve a large variety of images most with a different number of pixels. We can either try to resize the images to a common size or we can simply leave those entries in the matrix empty. It may also occur that a certain entry is supposed to be there but it couldn’t be measured. For instance, if we run an optical character recognition system on a scanned document some letters will not be recognized. We’ll use a question mark “?” to indicate that that entry wasn’t observed.

It is very important to realize that there are many ways to represent data and not all are equally suitable for analysis. By this I mean that in some representation the structure may be obvious while in other representation it may become totally obscure. It is still there, but just harder to find. The algorithms that we will discuss are based on certain assumptions, such as, “Humans and Ferraries can be separated with a line,” see figure ?. While this may be true if we measure weight in kilograms and height in meters, it is no longer true if we decide to re-code these numbers into bit-strings. The structure is still in the data, but we would need a much more complex assumption to discover it. A lesson to be learned is thus to spend some time thinking about in which representation the structure is as obvious as possible and transform the data if necessary before applying standard algorithms. In the next section we’ll discuss some standard preprocessing operations. It is often advisable to visualize the data before preprocessing and analyzing it. This will often tell you if the structure is a good match for the algorithm you had in mind for further analysis. Chapter ? will discuss some elementary visualization techniques.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 23

Context: # Chapter 3

## Learning

This chapter is without question the most important one of the book. It concerns the core, almost philosophical question of what learning really is (and what it is not). If you want to remember one thing from this book, you will find it here in this chapter.

Ok, let’s start with an example. Alice has a rather strange ailment. She is not able to recognize objects by their visual appearance. At home, she is doing just fine: her mother explained to Alice for every object in her house what it is and how you use it. When she is home, she recognizes these objects (if they have not been moved too much), but when she enters a new environment she is lost. For example, if she enters a new meeting room, she needs a long time to infer what the chairs and the table are in the room. She has been diagnosed with a severe case of "overfitting." 

What is the matter with Alice? Nothing is wrong with her memory because she remembers the objects once she has seen them. In fact, she has a fantastic memory. She remembers every detail of the objects she has seen. And every time she sees a new object, she reasons that the object in front of her is surely not a chair because it doesn’t have all the features she has seen in earlier chairs. The problem is that Alice cannot generalize the information she has observed from one instance of a visual object category to other, yet unobserved members of the same category. The fact that Alice’s disease is so rare is understandable; there must have been a strong selection pressure against this disease. 

Imagine our ancestors walking through the savanna one million years ago. A lion appears on the scene. Ancestral Alice has seen lions before, but not this particular one, and it does not induce a fear response. Of course, she has no time to infer the possibility that this animal may be dangerous logically. Alice’s contemporaries noticed that the animal was yellow-brown, had manes, etc., and immediately un-
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 27

Context: # 3.1 In a Nutshell

Learning is all about generalizing regularities in the training data to new, yet unobserved data. It is not about remembering the training data. Good generalization means that you need to balance prior knowledge with information from data. Depending on the dataset size, you can entertain more or less complex models. The correct size of the model can be determined by playing a compression game. 

Learning = generalization = abstraction = compression.
##########

"""QUERY: can you summarize in 3 sentence what is discrete math?"""

Consider the chat history for relevant information. Use all information included. Use as much tokens as needed. Important: If you find information separated by a | in the context, it is a table formatted in Markdown.Important: Take a look at the QUERY and only the QUERY. If this is vague or unclear, please ignore everything and ask a follow-up question instead!


Final Files Sources: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 1, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 3, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 5, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 6, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 9, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 10, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 11, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 12, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 13, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 15, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 17, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 18, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 21, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 23, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 24, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 25, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 27, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 28, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 30, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 31, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 33, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 34, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 36, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 37, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 39, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 40, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 41, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 43, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 49, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 52, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 54, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 55, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 60, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 65, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 67, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 68, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 71, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 75, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 81, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 85, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 89, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 95, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 99, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 100, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 102, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 109, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 119, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 129, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 142, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 144, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 145, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 149, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 150, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 151, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 152, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 155, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 156, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 163, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 165, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 167, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 171, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 175, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 182, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 199, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 201, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 206, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 215, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 217, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 219, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 222, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 224, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 226, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 230, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 231, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 233, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 235, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 239, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 240, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 244, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 245, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 251, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 252, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 253, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 254, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 257, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 3, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 8, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 13, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 17, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 49, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 57, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 58, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 64, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 85, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 130, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 191, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 9, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 15, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 23, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 27
==================================================
**Elapsed Time: 3.54 seconds**
==================================================


FINAL ANSWER 
Answer: Discrete mathematics is a branch of mathematics that deals with distinct and separate objects, focusing on concepts such as sets, functions, and relations, which are foundational for various applications in computer science and information technology. It encompasses topics like combinatorics, graph theory, and logic, emphasizing the study of structures that are fundamentally discrete rather than continuous, such as integers and finite graphs. The principles of discrete mathematics are essential for understanding algorithms, data structures, and the theoretical underpinnings of computer science, making it a critical area of study for computer scientists and engineers.
==================================================
**Elapsed Time: 2.05 seconds**
==================================================