{
    "query": "an you summarize in 2 sentences what is discrete math?",
    "namespace": "e791d701-717d-4c06-9602-2878acdbb6ad",
    "messages": [
        {
            "role": "assistant",
            "content": "Hallo diverse NF SepTest! Ich bin Simon, der KI-Chatbot, der das gro\u00dfe Sprachmodell gpt-4o-mini-2024-07-18 verwendet, um Ihre Fragen zu beantworten. Ich freue mich, Ihnen behilflich zu sein!\n\nHier sind einige Themen, \u00fcber die wir sprechen k\u00f6nnten: \n\n1. \ud83d\udcda Grundlagen der diskreten Mathematik\n2. \ud83d\udcca Einf\u00fchrung in maschinelles Lernen\n3. \ud83d\udd0d Wahrscheinlichkeitsmodelle und deren Anwendungen\n4. \ud83e\uddee Kombinatorik und Z\u00e4hlprinzipien\n5. \ud83d\udcc8 Klassifikationsalgorithmen im maschinellen Lernen\n6. \ud83e\udde0 Nearest Neighbors Klassifikation\n7. \ud83d\udcc9 Support Vector Machines (SVM)\n8. \ud83d\udd17 Bayesianische Netzwerke und Markov-Zufallsfelder\n9. \ud83d\udccf Dimensionale Reduktionstechniken wie PCA\n10. \ud83d\udd04 Reinforcement Learning und seine Anwendungen\n\nGibt es ein bestimmtes Thema auf dieser Liste, das Sie n\u00e4her erkunden m\u00f6chten?"
        },
        {
            "role": "user",
            "content": "halo"
        },
        {
            "role": "assistant",
            "content": "Hallo! Wie kann ich Ihnen helfen? Gibt es eine spezifische Frage oder ein Thema, \u00fcber das Sie mehr erfahren m\u00f6chten?"
        },
        {
            "role": "user",
            "content": "can you summarize in 3 sentence what is discrete math?"
        },
        {
            "role": "assistant",
            "content": "Discrete mathematics is a branch of mathematics that deals with distinct and separate objects, focusing on concepts such as sets, functions, and relations, which are foundational for various applications in computer science and information technology. It encompasses topics like combinatorics, graph theory, and logic, emphasizing the study of structures that are fundamentally discrete rather than continuous, such as integers and finite graphs. The principles of discrete mathematics are essential for understanding algorithms, data structures, and the theoretical underpinnings of computer science, making it a critical area of study for computer scientists and engineers."
        },
        {
            "role": "user",
            "content": "test"
        },
        {
            "role": "assistant",
            "content": "It seems like you're testing the system. Could you please clarify what specific information or assistance you need? I'm here to help!"
        },
        {
            "role": "user",
            "content": "halo"
        },
        {
            "role": "assistant",
            "content": "Could you please clarify your query or provide more details? I'm here to assist you!"
        },
        {
            "role": "user",
            "content": "an you summarize in 2 sentences what is discrete math?"
        },
        {
            "role": "assistant",
            "content": ""
        }
    ],
    "stream": true,
    "language_level": "",
    "chat_channel": ":3997:1726667082838",
    "language": "German",
    "tone": "neutral",
    "writing_style": "standard",
    "model": "gpt-4o-mini-2024-07-18",
    "knowledgebase": "ki-dev-large",
    "seed": 1055,
    "client_id": 1726667082838,
    "all_context": false,
    "follow_up_for": null,
    "knowledgebase_files_count": 3,
    "override_command": null,
    "disable_clarity_check": false,
    "custom_primer": "",
    "logging": true,
    "query_route": ""
}


INITIALIZATION
Knowledgebase: ki-dev-large
Base Query: an you summarize in 2 sentences what is discrete math?
Model: gpt-4o-mini-2024-07-18
Use Curl?: None
==================================================
**Elapsed Time: 0.00 seconds**
==================================================


ROUTING
Query type: summary
==================================================
**Elapsed Time: 2.05 seconds**
==================================================


RAG PARAMETERS
Max Context To Include: 100
Lowest Score to Consider: 0.1
==================================================
**Elapsed Time: 0.14 seconds**
==================================================


VECTOR SEARCH ALGORITHM TO USE 
Use MMR search?: True
Use Similarity search?: False
==================================================
**Elapsed Time: 0.11 seconds**
==================================================


VECTOR SEARCH DONE 
==================================================
**Elapsed Time: 7.05 seconds**
==================================================


PRIMER 
Primer: 
            You are Simon, a highly intelligent personal assistant in a system called KIOS. You are a chatbot that
            can read knowledgebases through the "CONTEXT" that is included in the user's chat message.

        
            Your role is to act as an expert at summarization and analysis.

            In your responses to enterprise users, prioritize clarity, trustworthiness, and appropriate formality.

            Be honest by admitting when a topic falls outside your scope of knowledge, and suggest
            alternative avenues for obtaining information when necessary.

            Make effective use of chat history to avoid redundancy and enhance response relevance, continuously
            adapting to integrate all necessary details in your interactions.

            Use as much tokens as possible to provide a detailed response.
        
==================================================
**Elapsed Time: 0.18 seconds**
==================================================


FINAL QUERY 
Final Query: CONTEXT: ##########
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 1

Context: # A Cool Brisk Walk  
Through Discrete Mathematics

**Version 2.2.1**  
**Stephen Davies, Ph.D.**  

## Table of Contents
1. Introduction
2. Concepts
   - Sets
   - Functions
   - Relations
3. Applications
4. Conclusion

## Introduction
In this document, we explore various topics under discrete mathematics. 

## Concepts

### Sets
- Definition
- Notation
- Operations

### Functions
- Definition
- Types of functions
- Examples

### Relations
- Definition
- Properties

| Concept   | Description                       |
|-----------|-----------------------------------|
| Set      | A collection of distinct objects   |
| Function  | A relation from a set to itself    |
| Relation  | A connection between two sets       |

## Applications
The concepts of discrete mathematics can be applied in various fields, including computer science, information technology, and combinatorial design.

## Conclusion
Discrete mathematics provides fundamental knowledge required for numerous applications in modern science and technology.

Image Analysis: 

### Analysis of the Attached Visual Content

#### 1. Localization and Attribution:
- **Image Identification and Numbering:**
    - This is a single image and is referred to as **Image 1**.

#### 2. Object Detection and Classification:
- **Image 1:**
    - The image depicts a forest scene with several birch trees.
    - **Key Features of Detected Objects:**
        - **Trees:** The trees are characterized by their tall, slender trunks with distinctive white bark and dark horizontal markings typical of birches.
        - **Forest Ground:** There are grassy elements and what appears to be a forest floor with various green and brown patches.

#### 3. Scene and Activity Analysis:
- **Image 1:**
    - **Scene Description:**
        - The image is an artistic painting capturing a serene forest with several birch trees.
        - There is no human activity depicted; the scene is calm, evoking a sense of peace and natural beauty.

#### 4. Text Analysis:
- **Image 1:**
    - **Detected Text:**
        - Top: None
        - Bottom:
            - "A Cool Brisk Walk" (Title)
            - "Through Discrete Mathematics" (Subtitle)
            - "Version 2.2.1" (Version Information)
            - "Stephen Davies, Ph.D." (Author Name)
    - **Text Significance:**
        - The text suggests that this image is probably the cover of a book titled "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies, Ph.D. The version number indicates it is version 2.2.1, signifying a possibly updated or revised edition.

#### 8. Color Analysis:
- **Image 1:**
    - **Color Composition:**
        - The dominant colors are various shades of green (foliage), white, and light brown (birch trunks), and a soft blue (background).
    - **Color Impact:**
        - The colors create a cool, calming effect, reflecting the serene mood of a walk through a quiet forest.

#### 9. Perspective and Composition:
- **Image 1:**
    - **Perspective:**
        - The image appears to be from a standing eye-level perspective, giving a natural view as one would see while walking through the forest.
    - **Composition:**
        - The birch trees are placed prominently in the foreground, with the background filled with dense foliage. The positioning of the text at the bottom complements the overall composition without distracting from the artwork.

#### 10. Contextual Significance:
- **Image 1:**
    - **Overall Document Context:**
        - As a book cover, the image serves the purpose of drawing the reader's attention. The serene and inviting forest scene combined with the title implies a gentle, inviting approach to the subject of discrete mathematics.

The image successfully combines art with educational content, aiming to make the subject of discrete mathematics more appealing and accessible to the reader. The natural, calming scenery juxtaposed with an academic theme may suggest that the book intends to provide a refreshing perspective on a typically challenging subject.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 3

Context: # Contents at a glance

## Contents at a glance
- **Contents at a glance** …… i
- **Preface** …… iii
- **Acknowledgements** …… v
  
1. **Meetup at the trailhead** …… 1
2. **Sets** …… 7
3. **Relations** …… 35
4. **Probability** …… 59
5. **Structures** …… 85
6. **Counting** …… 141
7. **Numbers** …… 165
8. **Logic** …… 197
9. **Proof** …… 223

Also be sure to check out the forever-free-and-open-source instructional videos that accompany this series at [www.allthemath.org](http://www.allthemath.org)!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 5

Context: # Preface

Discrete math is a popular book topic — start Googling around and you’ll find a zillion different textbooks about it. Take a closer look, and you’ll discover that most of these are pretty thick, dense volumes packed with lots of equations and proofs. They’re principled approaches, written by mathematicians and (seemingly) to mathematicians. I speak with complete frankness when I say I’m comforted to know that the human race is well covered in this area. We need smart people who can derive complex expressions and prove theorems from scratch, and I’m glad we have them.

Your average computer science practitioner, however, might be better served by a different approach. There are elements to the discrete math mindset that a budding software developer needs experience with. This is why discrete math is (properly, I believe) part of the mandatory curriculum for most computer science undergraduate programs. But for future programmers and engineers, the emphasis should be different than it is for mathematicians and researchers in computing theory. A practical computer scientist mostly needs to be able to use these tools, not to derive them. She needs familiarity, and practice, with the fundamental concepts and the thought processes they involve. The number of times the average software developer will need to construct a proof in graph theory is probably near zero. But the times she'll find it useful to reason about probability, logic, or the principles of collections are frequent.

I believe the majority of computer science students benefit most from simply gaining an appreciation for the richness and rigor of
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 6

Context: # PREFACE

This material, what it means, and how it impacts their discipline. Becoming an expert theorem prover is not required, nor is deriving closed-form expressions for the sizes of trees with esoteric properties. Basic fluency with each topic area, and an intuition about when it can be applied, is the proper aim for most of those who would go forward and build tomorrow's technology.

To this end, the book in your hands is a quick guided tour of introductory-level discrete mathematics. It's like a cool, brisk walk through a pretty forest. I point out the notable features of the landscape and try to instill a sense of appreciation and even of awe. I want the reader to get a feel for the lay of the land and a little exercise. If the student acquires the requisite vocabulary, gets some practice playing with the toys, and learns to start thinking in terms of the concepts here described, I will count it as a success.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 9

Context: # Chapter 1

## Meetup at the trailhead

Before we set out on our “cool, brisk walk,” let’s get oriented. What is discrete mathematics, anyway? Why is it called that? What does it encompass? And what is it good for?

Let’s take the two words of the subject, in reverse order. First, **math**. When most people hear “math,” they think “numbers.” After all, isn’t math the study of quantity? And isn’t that the class where we first learned to count, add, and multiply?

Mathematics certainly has its root in the study of numbers — specifically, the “natural numbers” (the integers from 1 on up) that fascinated the ancient Greeks. Yet math is broader than this, almost to the point where numbers can be considered a special case of something deeper. In this book, when we talk about trees, sets, or formal logic, there might not be a number in sight.

Math is about **abstract, conceptual objects that have properties,** and the implications of those properties. An “object” can be any kind of “thought material” that we can define and reason about precisely. Much of math deals with questions like, “Suppose we defined a certain kind of thing that had certain attributes. What would be the implications of this, if we reasoned it all the way out?” The “thing” may or may not be numerical, whatever it turns out to be. Like a number, however, it will be crisply defined, have certain known aspects to it, and be capable of combining with other things in some way.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 10

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Fundamental to math is that it deals with the **abstract**. Abstract, which is the opposite of concrete, essentially means something that can’t be perceived with the senses. A computer chip is concrete: you can touch it, you can see it. A number is not; nor is a function, a binary tree, or a logical implication. The only way to perceive these things is with the power of the mind. We will write expressions and draw pictures of many of our mathematical structures in order to help visualize them, and nearly everything we study will have practical applications whereby the abstractness gets grounded in concreteness for some useful purpose. But the underlying mathematical entity remains abstract and ethereal — only accessible to the mind’s eye. We may use a pencil to form the figure “5” on a piece of paper, but that is only a concrete manifestation of the underlying concept of “five-ness.” Don’t mistake the picture or the symbol for the thing itself, which always transcends any mere physical representation.

The other word in the name of our subject is **“discrete”** (not to be confused with “discreet,” which means something else entirely). The best way to appreciate what discrete means is to contrast it with its opposite, **continuous**. Consider the following list:

| Discrete                      | Continuous             |
|-------------------------------|------------------------|
| whole numbers (\( \mathbb{Z} \))  | real numbers (\( \mathbb{R} \)) |
| int                           | double                 |
| digital                       | analog                 |
| quantum                       | continuum              |
| counting                      | measuring              |
| number theory                 | analysis               |
| \( \Sigma \)                  | \( \int \frac{d}{dx} \) |

What do the left-hand entries have in common? They describe things that are measured in crisp, distinct intervals, rather than varying smoothly over a range. Discrete things jump suddenly from position to position, with rigid precision. If you’re 5 feet tall, you might some day grow to 5.3 feet; but though there might be 5
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 11

Context: # Understanding Integration and Differentiation

People in your family, there will never be 5.3 of them (although there could be 6 someday).

The last couple of entries on this list are worth a brief comment. They are math symbols, some of which you may be familiar with.

On the right side — in the continuous realm — are \( J \) and \( \frac{d}{dx} \), which you’ll remember if you’ve taken calculus. They stand for the two fundamental operations of integration and differentiation. Integration, which can be thought of as finding “the area under a curve,” is basically a way of adding up a whole infinite bunch of numbers over some range. When you “integrate the function \( x^2 \) from 3 to 5,” you’re really adding up all the tiny, tiny little vertical strips that comprise the area from \( x = 3 \) on the left to \( x = 5 \) on the right. Its corresponding entry in the left-column of the table is \( \Sigma \), which is just a short-hand for “sum up a bunch of things.” Integration and summation are equivalent operations; it’s just that when you integrate, you’re adding up all the (infinitely many) slivers across the real-line continuum. When you sum, you’re adding up a fixed sequence of entries, one at a time, like in a loop. \( \Sigma \) is just the discrete “version” of \( J \).

The same sort of relationship holds between ordinary subtraction (\(-\)) and differentiation (\(\frac{d}{dx}\)). If you’ve plotted a bunch of discrete points on \( x \)-\( y \) axes, and you want to find the slope between two of them, you just subtract their \( y \) values and divide by the \( x \) distance between them. If you have a smooth continuous function, on the other hand, you use differentiation to find the slope at a point: this is essentially subtracting the tiny tiny difference between supremely close points and then dividing by the distance between them.

Don’t worry; you don’t need to have fully understood any of the integration or differentiation stuff I just talked about, or even to have taken calculus yet. I’m just trying to give you some feel for what “discrete” means, and how the dichotomy between discrete and continuous really runs through all of math and computer science. In this book, we will mostly be focusing on discrete values and structures, which turn out to be of more use in computer science. That’s partially because, as you probably know, computers
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 12

Context: # CHAPTER 1. MEETUP AT THE TRAILHEAD

Themselves are discrete, and can only store and compute discrete values. There can be many of them — megabytes, gigabytes, terabytes — but each value stored is fundamentally comprised of bits, each of which has a value of either 0 or 1. This is unlike the human brain, by the way, whose neuronal synapses communicate based on the continuous quantities of chemicals present in their axons. So I guess “computer” and “brain” are another pair of entities we could add to our discrete vs. continuous list.

There’s another reason, though, why discrete math is of more use to computer scientists than continuous math is, beyond just the bits-and-bytes thing. Simply put, computers operate algorithmically. They carry out programs in step-by-step, iterative fashion. First do this, then do that, then move on to something else. This mechanical execution, like the ticking of a clock, permeates everything the computer can do, and everything we can tell it to do. At a given moment in time, the computer has completed step 7, but not step 8; it has accumulated 38 values, but not yet 39; its database has exactly 15 entries in it, no more and no less; it knows that after accepting this friend request, there will be exactly 553 people in your set of friends. The whole paradigm behind reasoning about computers and their programs is discrete, and that’s why we computer scientists find different problems worth thinking about than most of the world did a hundred years ago.

But it’s still math. It’s just discrete math. There’s a lot to come, so limber up and let me know when you’re ready to hit the road.

## 1.1 Exercises

Use an index card or a piece of paper folded lengthwise, and cover up the right-hand column of the exercises below. Read each exercise in the left-hand column, answer it in your mind, then slide the index card down to reveal the answer and see if you’re right! For every exercise you missed, figure out why you missed it before moving on.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 13

Context: # 1.1 EXERCISES

1. **What’s the opposite of concrete?**  
   Abstract.

2. **What’s the opposite of discrete?**  
   Continuous.

3. **Consider a quantity of water in a glass. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Concrete, since it’s a real entity you can experience with the senses. Continuous, since it could be any number of ounces (or liters, or tablespoons, or whatever). The amount of water certainly doesn’t have to be an integer. (Food for thought: since all matter is ultimately comprised of atoms, are even substances like water discrete?)

4. **Consider the number 27. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Abstract, since you can’t see or touch or smell “twenty-seven.” Probably discrete, since it’s an integer, and when we think of whole numbers we think “discrete.” (Food for thought: in real life, how would you know whether I meant the integer “27” or the decimal number “27.00”? And does it matter?) Clearly it’s discrete. Abstract vs. concrete, though, is a little tricky. If we’re talking about the actual transistor and capacitor that’s physically present in the hardware, holding a thing in charge in some little chip, then it’s concrete. But if we’re talking about the value “1” that is conceptually part of the computer’s currently executing state, then it’s really abstract just like 27 was. In this book, we’ll always be talking about bits in this second, abstract sense.

5. **Consider a bit in a computer’s memory. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Any kind of abstract object that has properties we can reason about. 

6. **If math isn’t just about numbers, what else is it about?**  
   Any kind of abstract object that has properties we can reason about.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 15

Context: # Chapter 2

## Sets

The place from which we'll start our walk is a body of mathematics called "set theory." Set theory has an amazing property: it's so simple and applicable that almost all the rest of mathematics can be based on it! This is all the more remarkable because set theory itself came along pretty late in the game (as things go) — it was singlehandedly invented by one brilliant man, Georg Cantor, in the 1870s. That may seem like a long time ago, but consider that by the time Cantor was born, mankind had already accumulated an immense wealth of mathematical knowledge: everything from geometry to algebra to calculus to prime numbers. Set theory was so elegant and universal, though, that after it was invented, nearly everything in math was redefined from the ground up to be couched in the language of sets. It turns out that this simple tool is an amazingly powerful way to reason about mathematical concepts of all flavors. Everything else in this book stands on set theory as a foundation.

Cantor, by the way, went insane as he tried to extend set theory to fully encompass the concept of infinity. Don't let that happen to you!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 17

Context: # 2.1. THE IDEA OF A SET

is the same whether we write it like this:

\( P = \{ \text{Dad}, \text{Mom} \} \)

or this:

\( P = \{ \text{Mom}, \text{Dad} \} \)

Those are just two different ways of writing the same thing.

The set \( E \) that had nobody in it can be written like this, of course:

\( E = \{ \} \)

but we sometimes use this special symbol instead:

\( E = \emptyset \)

However you write it, this kind of set (one that has no elements) is referred to as an **empty set**.

The set \( H \), above, contained all the members of the group under consideration. Sometimes we'll refer to "the group under consideration" as the **domain of discourse**. It too is a set, and we usually use the symbol \( \Omega \) to refer to it. So in this case,

\( \Omega = \{ \text{Mom}, \text{Johnny}, \text{T.J.}, \text{Dad}, \text{Lizzy} \} \)

Another symbol we'll use a lot is \( \in \), which means "is a member of." Since Lizzy is a female, we can write:

\( \text{Lizzy} \in F \)

to show that Lizzy is a member of the set \( F \). Conversely, we write:

\( \text{T.J.} \notin F \)

to show that T.J. is not.

As an aside, I mentioned that every item is either in, or not in, a set: there are no shades of gray. Interestingly, researchers have developed another body of mathematics called [I kid you not] *fuzzy*[^1] set theory.

[^1]: Some authors use the symbol \( U \) for this, and call it the **universal set**.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 18

Context: # CHAPTER 2. SETS

Set theory? Fuzzy sets change this membership assumption: items can indeed be “partially in” a set. One could declare, for instance, that Dad is “100% female,” which means he’s only 100% in the \( F \) set. That might or might not make sense for gender, but you can imagine that if we defined a set \( T \) of “the tall people,” such a notion might be useful. At any rate, this example illustrates a larger principle which is important to understand: in math, things are the way they are simply because we’ve decided it’s useful to think of them that way. If we decide there’s a different useful way to think about them, we can define new assumptions and proceed according to new rules. It doesn’t make any sense to say “sets are (or aren’t) really fuzzy”: because there is no “really.” All mathematics proceeds from whatever mathematicians have decided is useful to define, and any of it can be changed at any time if we see fit.

## 2.2 Defining sets

There are two ways to define a set: extensionally and intensionally. I’m not saying there are two kinds of sets; rather, there are simply two ways to specify a set.

To define a set extensionally is to list its actual members. That’s what we did when we said \( P = \{ Dad, Mom \} \) above. In this case, we’re not giving any “meaning” to the set; we’re just mechanically spelling out what’s in it. The elements Dad and Mom are called the **extension** of the set \( P \).

The other way to specify a set is intensionally, which means to describe its meaning. Another way to think of this is specifying a rule by which it can be determined whether or not a given element is in the set. If I say “Let \( P \) be the set of all parents,” I am defining \( P \) intensionally. I haven’t explicitly said which specific elements of the set are in \( P \). I’ve just given the meaning of the set, from which you can figure out the extension. We call “parent-ness” the **intension** of \( P \).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 21

Context: 2.4 Sets are not arrays
=======================

If you’ve done some computer programming, you might see a resemblance between sets and the collections of items often used in a program: arrays, perhaps, or linked lists. To be sure, there are some similarities. But there are also some very important differences, which must not be overlooked:

- **No order.** As previously mentioned, there is no order to the members of a set. “{Mom, Dad}” is the same set as “{Dad, Mom}”. In a computer program, of course, most arrays or lists have first, second, and last elements, and an index number assigned to each.

- **No duplicates.** Suppose \( M \) is the set of all males. What would it possibly mean to say \( M = \{T.J., T.J., Johnny\} \)? Would that mean that “T.J. is twice the man that Johnny is”? This is obviously nonsensical. The set \( M \) is based on a property: maleness. Each element of \( M \) is either male, or it isn't. It can't be “male three times.” Again, in an array or linked list, you could certainly have more than one copy of the same item in different positions.

- **Infinite sets.** 'Nuff said. I’ve never seen an array with infinitely many elements, and neither will you.

- **Untyped.** Most of the time, an array or other collection in a computer program contains elements of only a single type; it’s an array of integers, or a linked list of Customer objects, for example. This is important because the program often needs to treat all elements in the collection the same way. Perhaps it needs to loop over the array to add up all the numbers, or iterate through a linked list and search for customers who have not placed an order in the last six months.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 22

Context: ```
# CHAPTER 2. SETS

A program would run into problems if it tried to add a string of text to its cumulative total, or encountered a Product object in the middle of its list of Customers. Sets, though, can be heterogeneous, meaning they can contain different kinds of things. The Davies family example had all human beings, but nothing stops me from creating a set \( S = \{ \text{Jack Nicholson}, \text{Kim Kardashian}, \text{Universal Studios}, 5756, \star \} \).

I don’t press this point too hard for a couple of reasons. First, most programming languages do allow heterogeneous collections of some sort, even if they’re not the most natural thing to express. In Java, you can define an `ArrayList` as a non-generic one that simply holds items of class “Object.” In C, you can have an array of void pointers to some unspecified type—which allows your array to point to different kinds of things. Unless it’s a loosely-typed language, though (like Perl or JavaScript), it sort of feels like you’re bending over backwards to do this. The other reason I make this distinction lightly is that when we’re dealing with sets, we often find it useful to deal with things of only one type, and so our \( S \) ends up being homogeneous anyway.

## 2.5 Sets are not ordered pairs (or tuples)

You’ll remember from high school algebra the notion of an ordered pair \( (x, y) \). We dealt with those when we wanted to specify a point to plot on a graph: the first coordinate gave the distance from the origin on the x-axis, and the second coordinate on the y-axis. Clearly an ordered pair is not a set, because as the name implies it is 
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 23

Context: # 2.6 Sets of Sets

Sets are heterogeneous — a single set can contain four universities, seven integers, and an image — and so it might occur to you that they can contain other sets as well. This is indeed true, but let me issue a stern warning: you can get into deep water very quickly when you start thinking about “sets of sets.” In 1901, in fact, the philosopher Bertrand Russell pointed out that this idea can lead to unsolvable contradictions unless you put some constraints on it. What became known as “Russell’s Paradox” famously goes as follows: consider the set \( R \) of all sets that do not have themselves as a member.

For this reason, we’ll be very careful to use curly braces to denote sets, and parentheses to denote ordered pairs.

By the way, although the word “coordinate” is often used to describe the elements of an ordered pair, that’s really a geometry-centric word that implies a visual plot of some kind. Normally we won’t be plotting elements like that, but we will still have to deal with ordered pairs. I’ll just call the constituent parts “elements” to make it more general.

Three-dimensional points need ordered triples \( (x, y, z) \), and it doesn’t take a rocket scientist to deduce that we could extend this to any number of elements. The question is what to call them, and you do sort of sound like a rocket scientist (or other generic nerd) when you say tuple. Some people rhyme this word with “Drupal,” and others with “couple,” by the way, and there seems to be no consensus. If you have an ordered pair-type thing with 5 elements, therefore, it’s a 5-tuple (or a quintuple). If it has 117 elements, it’s a 117-tuple, and there’s really nothing else to call it. The general term (if we don’t know or want to specify how many elements) is n-tuple. In any case, it’s an ordered sequence of elements that may contain duplicates, so it’s very different than a set.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 24

Context: # CHAPTER 2. SETS

as members³. Now is \( R \) a member of itself, or isn’t it? Either way you answer turns out to be wrong (try it!) which means that this whole setup must be flawed at some level.

The good news is that as long as you don’t deal with this kind of self-referential loop (“containing yourself as a member”) then it’s pretty safe to try at home. Consider this set:

\[
V = \{ 3, 5, \{ 5, 4 \}, 2 \}
\]

This set has four (not five) members. Three of \( V \)’s members are integers: 2, 3, and 5. The other one is a set (with no name given). That other set, by the way, has two members of its own: 4 and 5. If you were asked, “is \( 4 \in V \)?” the answer would be no.

As a corollary to this, there’s a difference between

\[
\emptyset
\]

and 

\[
\{ \emptyset \}
\]

The former is a set with no elements. The latter is a set with one element: and that element just happens to be a set with nothing in it.

## 2.7 Cardinality

When we talk about the number of elements in a set, we use the word cardinality. You’d think we could just call it the “size” of the set, but mathematicians sometimes like words that sound cool. The cardinality of \( M \) (the set of males, where the Davies family is the domain of discourse) is 3, because there are three elements in it. The cardinality of the empty set \( \emptyset \) is 0. The cardinality of the set of all integers is \( \aleph_0 \). Simple as that.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 25

Context: 2.8 Some special sets
=====================

In addition to the empty set, there are symbols for some other common sets, including:

- **Z** — the integers (positive, negative, and zero)
- **N** — the natural numbers (positive integers and zero)
- **Q** — the rational numbers (all numbers that can be expressed as an integer divided by another integer)
- **R** — the real numbers (all numbers that aren't imaginary, even decimal numbers that aren't rational)

The cardinality of all these sets is infinity, although, as I alluded to previously, \(\mathbb{R}\) is in some sense "greater than" \(\mathbb{N}\). For the curious, we say that \(\mathbb{N}\) is a countably infinite set, whereas \(\mathbb{R}\) is uncountably infinite. Speaking very loosely, this can be thought of this way: if we start counting up all the natural numbers, 0, 1, 2, 3, 4, ..., we will never get to the end of them. But at least we can start counting. With the real numbers, we can't even get off the ground. Where do you begin? Starting with 0 is fine, but then what's the "next" real number? Choosing anything for your second number inevitably skips a lot in between. Once you've digested this, I'll spring another shocking truth on you: \(|\mathbb{Q}| \text{ is actually equal to } |\mathbb{R}|, \text{ not greater than it as } |\mathbb{R}|. \text{ Cantor came up with an ingenious numbering scheme whereby all the rational numbers — including 3, -9, \text{ and } \frac{1}{17} — can be listed off regularly, in order, just like the integers can. And so } |\mathbb{Q}| = |\mathbb{N}| \neq |\mathbb{R}|. \text{ This kind of stuff can blow your mind.}
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 27

Context: # 2.9. COMBINING SETS

\( B \) is “all the things *period* that aren't in \( B' \).” Of course, “all the things period” means “all the things that we're currently talking about.” The domain of discourse \( \Omega \) is very important here. If we're talking about the Davies family, we would say that \( M = \{ \text{Mom}, \text{Lizzy} \} \), because those are all the Davies who aren’t male. If, on the other hand, \( \Omega \) is “the grand set of absolutely everything,” then not only is Mom a member of \( M \), but so is the number 12, the French Revolution, and my nightmare last Tuesday about a rabid platypus.

- **Cartesian product** (\( \times \)). Looks like multiplication, but very different. When you take the Cartesian product of two sets \( A \) and \( B \), you don’t even get the elements from the sets in the result. Instead, you get ordered pairs of elements. These ordered pairs represent each combination of an element from \( A \) and an element from \( B \). For instance, suppose \( A = \{ \text{Bob}, \text{Dave} \} \) and \( B = \{ \text{Jenny}, \text{Gabrielle}, \text{Tiffany} \} \). Then:

\[
A \times B = \{ (\text{Bob}, \text{Jenny}), (\text{Bob}, \text{Gabrielle}), (\text{Bob}, \text{Tiffany}), (\text{Dave}, \text{Jenny}), (\text{Dave}, \text{Gabrielle}), (\text{Dave}, \text{Tiffany}) \}
\]

Study that list. The first thing to realize is that it consists of neither guys nor girls, but of ordered pairs. (Clearly, for example, \( \text{Jenny} \in A \times B \).) Every guy appears exactly once with every girl, and the guy is always the first element of the ordered pair. Since we have two guys and three girls, there are six elements in the result, which is an easy way to remember the \( x \) sign that represents Cartesian product. (Do not worry, make the common mistake of thinking that \( A \times B \) is \( A \) or \( B \) as a set, not a number. The cardinality of the set, of course, is \( 6 \), so it's appropriate to write \( |A \times B| = 6 \).)

## Laws of Combining Sets

There are a bunch of handy facts that arise when combining sets using the above operators. The important thing is that these are all easily seen just by thinking about them for a moment. Put another way, these aren’t facts to memorize; they’re facts to look at and see.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 28

Context: # CHAPTER 2. SETS

for yourself. They’re just a few natural consequences of the way we’ve defined sets and operations, and there are many others.

- **Union and intersection are commutative.** As noted above, it’s easy to see that \( A \cup B \) will always give the same result as \( B \cup A \). Same goes for \( \cap \), though.

- **Union and intersection are associative.** “Associative” means that if you have an operator repeated several times, left to right, it doesn’t matter which order you evaluate them in. \( (A \cup B) \cup C \) will give the same result as \( A \cup (B \cup C) \). This means we can freely write expressions like \( X \cup Y \cup Z \) and no one can accuse us of being ambiguous. This is also true if you have three (or more) intersections in a row. Be careful, though: associativity does not hold if you have unions and intersections mixed together. If I write \( A \cap B \cap C \) it matters very much whether I do the union first or the intersection first. This is just how it works with numbers: \( 4 + 3 \) gives either \( 10 \) or \( 14 \) depending on the order of operations. In algebra, we learned that \( x \) has precedence over \( + \), and you’ll always do that one first in the absence of parentheses. We could establish a similar order for set operations, but we won’t: we’ll always make it explicit with parentheses.

- **Union and intersection are distributive.** You’ll recall from basic algebra that \( a \cdot (b + c) = ab + ac \). Similarly with sets,

\[ X \cap (Y \cup Z) = (X \cap Y) \cup (X \cap Z) \]

It's important to work this out for yourself rather than just memorize it as a rule. Why does it work? Well, take a concrete example. Suppose \( Y \) is the set of all female students, \( S \) is the set of all computer science majors, and \( Z \) is the set of all math majors. (Some students, of course, double-major in both.) The left-hand side of the equals sign says “first take all the math and computer science majors and put them in a group. Then, intersect that group with the women’s group.” The result is “women who are either computer science majors or math majors (or both).” Now
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 30

Context: # CHAPTER 2. SETS

Let \( X \) be the set of all known murderers. Let \( Y \) be the set of all known thieves. 

Again, it's best understood with a specific example. Let’s say you’re renting a house and want to make sure you don’t have any shady characters under the roof. 

Now, as a landlord, you don’t want any thieves or murderers renting your property. So who are you willing to rent to? Answer: if \( \Omega \) is the set of all people, you are willing to rent to \( X \cup Y \).

Why that? Because if you take \( X \cup Y \), that gives you all the undesirables: people who are either murderers or thieves (or both). You don’t want to rent to any of them. In fact, you want to rent to the complement of that set; namely, “anybody else.” Putting an overbar on that expression gives you all the non-thieves and non-murderers.

Very well. But now look at the right-hand side of the equation. \( X \) gives you the non-thieves. \( Y \) gives you the non-murderers. Now in order to get acceptable people, you want to rent only to someone who’s in both groups. Put another way, they have to be both a non-thief and a non-murderer in order for you to rent to them. Therefore, they must be in the intersection of the non-thief group and the non-murderer group. Therefore, the two sides of this equation are the same.

The other form of De Morgan’s law is stated by flipping the intersections and unions:

\[
X \cap Y = \overline{X \cup Y}
\]

Work this one out for yourself using a similar example, and convince yourself it’s always true.

Augustus De Morgan, by the way, was a brilliant 19th century mathematician with a wide range of interests. His name
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 31

Context: 2.10 Subsets
===========

We learned that the “∈” symbol is used to indicate set membership: the element on the left is a member of the set on the right. A related but distinct notion is the idea of a subset. When we say \( X \subset Y \) (pronounced “X is a subset of Y”), it means that every member of \( X \) is also a member of \( Y \). The reverse is not necessarily true, of course; otherwise “⊆” would just mean “=”. So if \( A = \{ Dad, Lizzy \} \) and \( K = \{ Dad, Mom, Lizzy \} \), then we can say \( A \subset K \).

Be careful about the distinction between “∈” and “⊆”, which are often confused. With “∈”, the thing on the left is an element, whereas with “⊆”, the thing on the left is a set. This is further complicated by the fact that the element on the left-hand side of “∈” might well be a set.

Let's see some examples. Suppose \( Q \) is the set \( \{ 4, 9, 4 \} \).  \( Q \) has three elements here, one of which is itself a set. Now suppose that we let \( P \) be the set \( \{ 4, 9 \} \). Question: Is \( P \subset Q \)? The answer is yes: the set \( \{ 4, 9 \} \) (which is the same as the set \( \{ 9, 4 \} \), just written in a different way) is in fact an element of the set \( Q \). Next question: Is \( P \subset Q \)? The answer is no, \( P \nsubseteq Q \). If there are two of them (4 and 4) also in \( Q \), then 4 is a member of \( Q \), not 9. Last question: If \( R \) is defined to be \( \{ 2, 4 \} \), is \( R \subset Q \)? The answer is yes, since both 2 and 4 are members of \( Q \).

Notice that the definition here is a subset of itself. Sometimes, though, it’s useful to talk about whether a set is really a subset of another, and you don’t want to “count” if the two sets are actually equal. This is called a proper subset, and the symbol for it is \( \subsetneq \). You can see the rationale for the choice of symbol, because “⊆” is kind of like “<” for numbers, and “⊂” is like “<”.

Every set is a subset (not necessarily a proper one) of itself, because
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 33

Context: ```markdown
## 2.11. POWER SETS

Don't lose sight. There are four elements to the power set of \( A \), each of which is one of the possible subsets. It might seem strange to talk about "all of the possible subsets"—when I first learned this stuff, I remember thinking at first that there would be no limit to the number of subsets you could make from a set. But of course there is. To create a subset, you can either include, or exclude, each one of the original set's members. In \( A \)’s case, you can either (1) include both Dad and Lizzy, or (2) include Dad but not Lizzy, or (3) include Lizzy but not Dad, or (4) exclude both, in which case your subset is empty. Therefore, \( P(A) \) includes all four of those subsets.

Now what's the cardinality of \( P(X) \) for some set \( X \)? That's an interesting question, and one well worth pondering. The answer ripples through the heart of a lot of combinatorics and the binary number system, topics we'll cover later. And the answer is right at our fingertips if we just extrapolate from the previous example. To form a subset of \( X \), we have a choice to either include, or else exclude, each of its elements. So there's two choices for the first element, and then whether we choose to include or exclude that first element, there are two choices for the second, etc. 

So if \( |X| = 2 \) (recall that this notation means \( X \) has two elements) or \( X \) has a cardinality of \( 2^n \), then its power set has \( 2 \times 2 \) members. If \( |X| = 3 \), then its power set has \( 2 \times 2 \times 2 \) members. In general:

\[
|P(X)| = 2^{|X|}.
\]

As a limiting case (and a brain-bender), notice that if \( X \) is the empty set, then \( P(X) \) has one (not zero) members, because there is in fact one subset of the empty set: namely, the empty set itself. So \( |X| = 0 \), and \( |P(X)| = 1 \). And that jives with the above formula.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 36

Context: # CHAPTER 2: SETS

might be double-majoring in both. Hence, this group of subsets is neither mutually exclusive nor collectively exhaustive. It’s interesting to think about gender and partitions: when I grew up, I was taught that males and females were a partition of the human race. But now I’ve come to realize that there are non-binary persons who do not identify with either of those genders, and so it’s not a partition after all.

## Question:
Is the number of students \( |S| \) equal to the number of off-campus students plus the number of on-campus students? Obviously yes. But why? The answer: because the off-campus and on-campus students form a partition. If we added up the number of freshmen, sophomores, juniors, and seniors, we would also get \( |S| \). But adding up the number of computer science majors and English majors would almost certainly not be equal to \( |S| \), because some students would be double-counted and others counted not at all. This is an example of the kind of beautiful simplicity that partitions provide.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 38

Context: ```
# CHAPTER 2. SETS

1. **Is Yoda ⊂ J?**  
   No. Yoda isn't even a set, so it can't be a subset of anything.

2. **Is { Yoda } ⊆ J?**  
   Yes. The { Yoda } set that contains only Yoda is in fact a subset of J.

3. **Is { Yoda } ∈ J?**  
   No. Yoda is one of the elements of J, but { Yoda } is not. In other words, J contains Yoda, but J does not contain a set which contains Yoda (nor does it contain any sets at all, in fact).

4. **Is S ⊆ J?**  
   No.

5. **Is G ⊆ F?**  
   Yes, since the two sets are equal.

6. **Is G ⊂ F?**  
   No, since the two sets are equal, so neither is a proper subset of the other.

7. **Is ∅ ⊆ C?**  
   Yes, since the empty set is a subset of every set.

8. **Is ∅ ⊂ C?**  
   Yes, since the empty set is a subset of every set.

9. **Is F ⊆ Ω?**  
   Yes, since every set is a subset of Ω, and F is certainly not equal to Ω.

10. **Is F ⊂ Ω?**  
   Yes and yes. The empty set is an element of X because it's one of the elements, and it's also a subset of X because it's a subset of every set. Hmmm...

11. **Suppose X = { Q, α, { Z } }. Is α ∈ X?**  
   Yes.

12. **Let A = { Macbeth, Hamlet, Othello }. B = { Scrabble, Monopoly, Othello }, and T = { Hamlet, Village, Town }. What's A ∩ B?**  
   { Othello }.

13. **What's A ∪ B?**  
   { Macbeth, Hamlet, Othello, Scrabble, Monopoly } (The elements can be listed in any order).
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 39

Context: # 2.13. EXERCISES

21. What’s \( A \cap B \)?  
   \{ Macbeth, Hamlet \}.

22. What’s \( B \cap T \)?  
   ∅.

23. What’s \( B \cap \overline{T} \)?  
   B. (which is { Scrabble, Monopoly, Othello. })

24. What’s \( A \cup (B \cap T) \)?  
   \{ Hamlet, Othello, Macbeth \}.

25. What’s \( (A \cup B) \cap T \)?  
   \{ Macbeth, Hamlet \}.

26. What’s \( A \setminus B \)?  
   Simply \( T \), since these two sets have nothing in common.

27. What’s \( T' \setminus B \)?  
   \{ (Hamlet, Macbeth), (Hamlet, Hamlet), (Hamlet, Othello), (Village, Macbeth), (Village, Hamlet), (Village, Othello), (Town, Hamlet), (Town, Othello) \}. The order of the ordered pairs within the set is not important; the order of the elements within each ordered pair is important.

28. What’s \( T \cap X \)?  
   0.

29. What’s \( (B \cap B) \times (A \cap T) \)?  
   \{ (Scrabble, Hamlet), (Monopoly, Hamlet), (Othello, Hamlet) \}.

30. What’s \( I \cup B \cup T \)?  
   7.

31. What’s \( I \cap (A \cap T) \)?  
   21. (The first parenthesized expression gives rise to a set with 7 elements, and the second to a set with three elements (B itself). Each element from the first set gets paired with an element from the second, so there are 21 such pairings.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 40

Context: # CHAPTER 2. SETS

### 33. Is A an extensional set, or an intensional set?
The question doesn't make sense. Sets aren't "extensional" or "intensional"; rather, a given set can be described extensionally or intensionally. The description given in item 19 is an extensional one; an intensional description of the same set would be "The Shakespeare tragedies Stephen studied in high school."

### 34. Recall that G was defined as { Matthew, Mark, Luke, John }. Is this a partition of G?
- { Luke, Matthew }
- { John }

No, because the sets are not collectively exhaustive (Mark is missing).

### 35. Is this a partition of G?
- { Mark, Luke }
- { Matthew, Luke }

No, because the sets are neither collectively exhaustive (John is missing) nor mutually exclusive (Luke appears in two of them).

### 36. Is this a partition of G?
- { Matthew, Mark, Luke }
- { John }

Yes. (Trivia: this partitions the elements into the synoptic gospels and the non-synoptic gospels).

### 37. Is this a partition of G?
- { Matthew, Luke }
- { John, Mark }

Yes. (This partitions the elements into the gospels which feature a Christmas story and those that don't).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 41

Context: ## 2.13. EXERCISES

38. Is this a partition of \( G \)?

   Yes. (This partitions the elements into the gospels that were written by Jews, those that were written by Greeks, those that were written by Romans, and those that were written by Americans.)
   - \( \{ Matthew, John \} \)
   - \( \{ Luke \} \)
   - \( \{ Mark \} \)
   - \( \emptyset \)

39. What’s the power set of \( \{ Rihanna \} \)?

   \( \{ \{ Rihanna \}, \emptyset \} \)

   Yes, since \( \{ peanut, jelly \} \) is one of the eight subsets of \( \{ peanut, butter, jelly \} \). (Can you name the other seven?)

40. Is \( \{ peanut, jelly \} \in P(\{ peanut, butter, jelly \}) \)?

41. Is it true for every set \( S \) that \( S \in P(S) \)?

   Yep.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 43

Context: # Chapter 3

## Relations

Sets are fundamental to discrete math, both for what they represent in themselves and for how they can be combined to produce other sets. In this chapter, we’re going to learn a new way of combining sets, called relations.

### 3.1 The Idea of a Relation

A relation between a set \( X \) and \( Y \) is a subset of the Cartesian product. That one sentence packs in a whole heck of a lot, so spend a moment thinking deeply about it. Recall that \( X \times Y \) yields a set of ordered pairs, one for each combination of an element from \( X \) and an element from \( Y \). If \( X \) has 5 elements and \( Y \) has 4, then \( X \times Y \) is a set of 20 ordered pairs. To make it concrete, if \( X \) is the set \{ Harry, Ron, Hermione \} and \( Y \) is the set \{ Dr. Pepper, Mt. Dew \}, then \( X \times Y \) is:

- \( (Harry, Dr. Pepper) \)
- \( (Harry, Mt. Dew) \)
- \( (Ron, Dr. Pepper) \)
- \( (Ron, Mt. Dew) \)
- \( (Hermione, Dr. Pepper) \)
- \( (Hermione, Mt. Dew) \)

Convince yourself that every possible combination is in there. I listed them out methodically to make sure I didn’t miss any (all the Harry’s first, with each drink in order, then all the Ron’s, etc.), but of course there’s no order to the members of a set, so I could have listed them in any order.

Now if I define a relation between \( X \) and \( Y \), I’m simply specifying that certain of these ordered pairs are in the relation, and certain
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 49

Context: # 3.5. Properties of Endorelations

nor (me, you) would be in the relation, but in any event the two cannot co-exist.

Note that **antisymmetry** is very different from **symmetric**. An asymmetric relation is simply one that's not symmetric: in other words, there's some \((x, y)\) in there without a matching \((y, x)\). An antisymmetric relation, on the other hand, is one in which there are guaranteed to be no matching \((y, x)\) for any \((x, y)\).

If you have trouble visualizing this, here's another way to think about it: realize that most relations are neither symmetric nor antisymmetric. It's kind of a coincidence for a relation to be symmetric: that would mean for every single \((x, y)\) it contains, it also contains a \((y, x)\). (What are the chances?) Similarly, it's kind of a coincidence for a relation to be antisymmetric: that would mean for every single \((x, y)\) it contains, it doesn't contain a \((y, x)\). (Again, what are the chances?) Your average Joe relation is going to contain some \((x, y)\) pairs that have matching \((y, x)\) pairs, and some that don't have matches. Such relations (the vast majority) are simply asymmetric: that is, neither symmetric nor antisymmetric.

Shockingly, it's actually possible for a relation to be both symmetric and antisymmetric! (But not asymmetric.) For instance, the empty relation (with no ordered pairs) is both symmetric and antisymmetric. It's symmetric because for every ordered pair \((x, y)\) in it (of which there are zero), there's also the corresponding \((y, x)\). And similarly, for every ordered pair \((y, x)\), the corresponding \((x, y)\) is not present. Another example is a relation with only “doubles” in it — say, \((3, 3)\), \((7, 7)\), \((Fred, Fred)\). This, too, is both symmetric and antisymmetric (work it out!).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 52

Context: # CHAPTER 3. RELATIONS

I contend that in this toy example, “isAtLeastAsToughAs” is a partial order, and \( A \) along with \( isAtLeastAsToughAs \) together form a poset. I reason as follows. It’s reflexive, since we started off by adding every dog with itself. It’s antisymmetric, since when we need add both \( (x, y) \) and \( (y, x) \) to the relation. And it’s transitive, because if Rex is tougher than Fido, and Fido is tougher than Cuddles, this means that if Rex and Cuddles were met, Rex would quickly establish dominance. (I’m no zoologist, and am not sure if the last condition truly applies with real dogs. But let’s pretend it does.)

It’s called a “partial order” because it establishes a partial, but incomplete, hierarchy among dogs. If we ask, “is dog X tougher than dog Y?” the answer is never ambiguous. We’re never going to say, “well, dog X was superior to dog A, who was superior to dog Y . . .” but then again, dog Y was superior to dog B, who was superior to dog X, so there’s no telling which of X and Y is truly toughest.” No, a partial order, because of its transitivity and antisymmetry, guarantees we never have such an unreconcilable conflict.

However, we could have a lack of information. Suppose Rex has never met Killer, and nobody Rex has met has ever met anyone Killer has met. There’s no chain between them. They’re in two separate universes as far as we’re concerned, and we’d have no way of knowing which was toughest. It doesn’t have to be that extreme, though. Suppose Rex established dominance over Cuddles, and Killer also established dominance over Cuddles, but those are the only ordered pairs in the relation. Again, there’s no way to tell whether Rex or Killer is the tougher dog. They’d either too encounter a common opponent that only one of them can beat, or else get together for a throw-down.

So a partial order gives us some semblance of structure — the relation establishes a directionality, and we’re guaranteed not to get wrapped up in contradictions — but it doesn’t completely order all the elements. If it does, it’s called a total order.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 57

Context: # 3.7. PROPERTIES OF FUNCTIONS

would be “phoneExtension(Sally) = 1317”, indicating that Sally can be reached at x1317. Some of the available extensions may be currently unused, but every employee does have one (and only one) which makes it a function. But since no two employees have the same extension, it is also an injective function.

Injective functions are sometimes called **one-to-one** functions. (One-to-one and injective are exact synonyms.)

- **Surjective.** A surjective function is one that reaches all the elements of its codomain: some z does in fact reach every y. Another way of saying this is: for a surjective function, the range equals the entire codomain. You can see that this is impossible if the domain is smaller than the codomain, since there wouldn’t be enough z values to reach all the y values. If we added Pepsi and Barq’s Root Beer to our Y set, we would thereby eliminate the possibility of any surjective functions from X to Y (unless we also added wizards, of course).

The function `worksIn` — with employees as the domain and departments as the codomain — is an example of an surjective function. One mapping of this function would be `worksIn(Sid) = Marketing`, indicating that Sid works in the Marketing department. Each employee works for one department, which makes it a function. But at least one employee works in every department (i.e., there are no empty departments with no people in them) which makes it surjective.

Surjective functions are sometimes called **“onto”** functions. (Onto and surjective are exact synonyms.)

- **Bijective.** Finally, a bijective function is simply one that is both injective and surjective. With an injective function, every y is mapped to by at most one x; with a surjective function, every y is mapped to by at least one x; so with a bijective function, every y is mapped to by exactly one x. Needless to say, the domain and the codomain must have the same cardinality for this to be possible.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 60

Context: # CHAPTER 3. RELATIONS

own wizard, and no soft drinks (or wizards) are left out. How exciting! This is a perfectly bijective function, also called a **bijection**. Again, the only way to get a bijection is for the domain and codomain to be the same size (although that alone does not guarantee a bijection; witness \( f \), above). Also observe that if they are the same size, then injectivity and surjectivity go hand-in-hand. Violate one, and you’re bound to violate the other. Uphold the one, and you’re bound to uphold the other. There’s a nice, pleasing, symmetrical elegance to the whole idea.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 64

Context: # CHAPTER 3. RELATIONS

22. **Is sameShirtColor transitive?**  
   Yes. If Kirk and Sulu wear the same color (yellow), and Sulu and Chekov wear the same color (yellow), then Kirk and Chekov most certainly will wear the same color (yellow).

23. **Above, we defined A as the set {Chuck, Julie, Sam} and S as the set {basketball, volleyball}. Then we defined the relation { (Julie, basketball), (Sam, basketball), (Julie, volleyball) }. Is this relation a function?**  
   No, because it's missing Chuck entirely.

24. **Suppose we added the ordered pair (Chuck, basketball) to it. Now is it a function?**  
   No, because Julie appears twice, mapping to two different values.

25. **Okay. Suppose we then remove (Julie, volleyball). We now have { (Julie, basketball), (Sam, basketball), (Chuck, basketball) }. Is this a function?**  
   Yes. Congratulations.

26. **Let's call this function "faveSport", which suggests that its meaning is to indicate which sport is each athlete's favorite. What's the domain of faveSport?**  
   {Julie, Chuck, Sam}.

27. **What's the codomain of faveSport?**  
   {basketball, volleyball}.

28. **What's the range of faveSport?**  
   {basketball}.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 67

Context: # Chapter 4

## Probability

Probability is the study of **uncertainty**. This may seem like a hopeless endeavor, sort of like knowing the unknowable, but it's not. The study of probability gives us tools for taming the uncertain world we live and program in, and for reasoning about it in a precise and helpful way.

We may not know exactly how long a particular visitor is willing to wait for our webpage to load in their browser, but we can use probability to estimate how much traffic we’ll lose if this takes longer than a certain average duration. We may not know which specific passwords a hacker will try as he attempts to break our security protocol, but we can use probability to estimate how feasible this approach will be for him. We may not know exactly when a certain program will run out of RAM and have to swap its data out to virtual memory, but we can predict how often this is likely to occur — and how painful it will be for us — given a certain system load and user behavior.

The trick is to use the tools we've already built — sets, relations, functions — to characterize and structure our notions of the relative likelihood of various outcomes. Once those underpinnings are secured, a layer of deductive reasoning will help us make good use of that information to begin to predict the future.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 68

Context: # 4.1 Outcomes and events

Since life is uncertain, we don’t know for sure what is going to happen. But let’s start by assuming we know what things might happen. Something that might happen is called an outcome. You can think of this as the result of an experiment if you want to, although normally we won’t be talking about outcomes that we have explicitly manipulated and measured via scientific means. It’s more like we’re just curious how some particular happening is going to turn out, and we’ve identified the different ways it can turn out and called them outcomes.

Now we’ve been using the symbol \( \Omega \) to refer to “the domain of discourse” or “the universal set” or “all the stuff we’re talking about.” We’re going to give it yet another name now: the sample space. \( \Omega \), the sample space, is simply the *set of all possible outcomes*. Any particular outcome — call it \( O \) — is an element of this set, just like in Chapter 1 every conceivable element was a member of the domain of discourse.

If a woman is about to have a baby, we might define \( \Omega \) as \{ boy, girl \}. Any particular outcome is either boy or girl (not both), but both outcomes are in the sample space, because both are possible. If we roll a die, we’d define \( \Omega \) as \{ 1, 2, 3, 4, 5, 6 \}. If we’re interested in motor vehicle safety, we might define \( \Omega \) for a particular road trip as \{ safe, accident \}. The outcomes don’t have to be equally likely; an important point we’ll return to soon.

In probability, we define an event as a *subset of the sample space*. In other words, an event is a group of related outcomes (though an event might contain just one outcome, or even zero). I always thought this was a funny definition for the word “event”: it’s not the first thing that word brings to mind. But it turns out to be a useful concept, because sometimes we’re not interested in any particular outcome necessarily, but rather in whether the outcome — whatever it is — has a certain property. For instance, suppose at the start of some game, my opponent and I each roll the die, agreeing that the highest roller gets to go first. Suppose he rolls a 2. Now it’s my turn. The \( \Omega \) for my die roll is of course \{ 1, 2, 3 \}.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 71

Context: # 4.2. PROBABILITY MEASURES

Now it turns out that not just any function will do as a probability measure, even if the domain (events) and codomain (real numbers in the range [0,1]) are correct. In order for a function to be a “valid” probability measure, it must satisfy several other rules:

1. \( \text{Pr}(\Omega) = 1 \)
2. \( \text{Pr}(A) \geq 0 \) for all \( A \subseteq \Omega \)
3. \( \text{Pr}(A \cup B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \cap B) \)

Rule 1 basically means "something has to happen." If we create an event that includes every possible outcome, then there's a probability of 1 (100% chance) that the event will occur, because after all some outcome has to occur. (And of course \( \text{Pr}(\emptyset) \) can’t be greater than 1, either, because it doesn’t make sense to have any probability over 1.) Rule 2 says there’s no negative probabilities: you can’t define any event, no matter how remote, that has a less than zero chance of happening.

Rule 3 is called the “additivity property,” and is a bit more difficult to get your head around. A diagram works wonders. Consider Figure 4.1, called a "Venn diagram," which visually depicts sets and their contents. Here we have defined three events: \( F \) (as above) is the event that the winner is a woman; \( R \) is the event that the winner is a rock musician (perhaps in addition to other musical genres); and \( U \) is the event that the winner is underage (i.e., becomes a multimillionaire before they can legally drink). Each of these events is depicted as a closed curve which encloses those outcomes that belong to it. There is obviously a great deal of overlap.

Now back to rule 3. Suppose I ask “what’s the probability that the All-time Idol winner is underage or a rock star?” Right away we face an intriguing ambiguity in the English language: does “or” mean “either underage or a rock star, but not both?” Or does it mean “underage and/or rock star?” The former interpretation is called an exclusive or and the latter an inclusive or. In computer science, we will almost always be assuming an inclusive or, unless explicitly noted otherwise.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 74

Context: # 4. PROBABILITY

It’s the size (cardinality) of the event set that matters, and the ratio of this number to the total number of events is the probability. For example, if we deal from a fair deck, the probability of drawing a face card is:

Pr(F) = \(\frac{|F|}{N}\)

= \(\frac{|\{K♠, K♥, K♦, K♣, \ldots, J♠\}|}{52}\)

= \(\frac{12}{52} = 0.231\)

Please realize that this shortcut **only** applies when the probability of each outcome is the same. We certainly couldn’t say, for example, that the probability of a user’s password starting with the letter q is just \(\frac{1}{26}\) because passwords surely don’t contain all letters with equal frequency. (At least, I’d be very surprised if that were the case.) The only way to solve a problem like this is to know how often each letter of the alphabet occurs.

## 4.3 Philosophical Interlude

Which brings me to an important question. How do we get these probability numbers, anyway? Everything so far has assumed that the numbers have been dropped into our lap.

The answer depends somewhat on your interpretation of what probability means. If we say *“the probability of getting heads on a coin flip is 0.5,”* what are we really saying? There have traditionally been two opposing answers to this question, called the frequentist view and the Bayesian view. It’s interesting to compare their claims.

The frequentist view is that we derive probabilities by simply running many trials and counting the results. The proportions of various outcomes yield a good idea of their probabilities, particularly if the sample size is large. Consider flipping a coin. If we flip a coin ten times and count three heads, we might not have a great
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 81

Context: # 4.5 TOTAL PROBABILITY

Pr(A) = Pr(A ∩ C₁) + Pr(A ∩ C₂) + ... + Pr(A ∩ C_N)  
= Pr(A|C₁)Pr(C₁) + Pr(A|C₂)Pr(C₂) + ... + Pr(A|C_N)Pr(C_N)  
= ∑ᵏ₌₁ Pr(A|Cₖ)Pr(Cₖ)  

is the formula.¹

Let's take an example of this approach. Suppose that as part of a promotion for Muvico Cinemas movie theatre, we're planning to give a door prize to the 1000th customer this Saturday afternoon. We want to know, though, the probability that this person will be a minor. Figuring out how many patrons overall will be under 18 might be difficult. But suppose we're showing these three films on Saturday: *Spider-Man: No Way Home*, *Here Before*, and *Sonic the Hedgehog 2*. We can estimate the fraction of each movie's viewers that will be minors: .6, .01, and .95, respectively. We can also predict how many tickets will be sold for each film: 2,000 for *Spider-Man*, 500 for *Here Before*, and 1,000 for *Sonic*.

Applying frequentist principles, we can compute the probability that a particular visitor will be seeing each of the movies:

- \(Pr(Spider-Man) = \frac{2000}{2000 + 500 + 1000} = 0.571\)

- \(Pr(Here \ Before) = \frac{500}{2000 + 500 + 1000} = 0.143\)

- \(Pr(Sonic) = \frac{1500}{2000 + 500 + 1000} = 0.286\)

¹ If you're not familiar with the notation in that last line, realize that \(Σ\) (a capital Greek "sigma") just represents a sort of loop with a counter. The \(k\) in the sign means that the counter is \(k\) and starts at 1. The "N" above the line means the counter goes up to \(N\), which is its last value. And what does the loop do? It adds up a cumulative sum. The thing being added to the total each time through the loop is the expression to the right of the sign. The last line with the \(Σ\) is just a more compact way of expressing the preceding line.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 89

Context: # 4.7 INDEPENDENCE

Whoops!

One last point on the topic of independence: please don’t make the mistake of thinking that mutually exclusive events are independent! This is by no means the case, and, in fact, the opposite is true. If two events are mutually exclusive, they are extremely dependent on each other!

Consider the most trivial case: I choose a random person on campus, and define \( I \) as the event that they’re an in-state student, and \( O \) as the event that they’re out-of-state. Clearly these events are mutually exclusive. But are they independent? Of course not! Think about it: if I told you a person was in-state, would that tell you anything about whether they were out-of-state? Duh. In a mutual exclusive case like this, event \( I \) completely rules out \( O \) (and vice versa), which means that although \( \text{Pr}(I) \) might be .635, \( \text{Pr}(I|O) \) is a big fat zero. More generally, \( \text{Pr}(A|B) \) is most certainly not going to be equal to \( \text{Pr}(A) \) if the two events are mutually exclusive, because learning about one event tells you everything about the other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 93

Context: # Chapter 5

## Structures

Much of computer science deals with representing and manipulating information. To do this, people have devised various **structures** for organizing chunks of data in a way that makes it easy to store, search, and retrieve. There’s a whole course in most computer science curricula called “data structures” which covers how to implement these structures in code. In this book, we won’t be talking about the code, but rather the abstract structures themselves. This chapter has a lot of pictures in it, which depict examples of the various structures in a very general way. The concepts here map directly to code when you need to put them into practice.

There are all kinds of data structures — arrays, linked lists, queues, stacks, hashtables, and heaps, to name a few — but they almost all boil down to one of two fundamental kinds of things: graphs, and trees. There are the two structures we’ll focus on in this chapter. A graph is just about the most general structure you can envision: a bunch of scattered data elements that are related to each other in some way. Almost every data structure imaginable can be recast as a type of graph. Trees are sort of a special case of graphs, but also sort of a topic in their own right, kind of like functions were a special type of relation, but also kind of different. A tree can be seen as a type of graph that imposes extra special conditions which give some navigational benefit.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 95

Context: # 5.1. GRAPHS

The edges, we have precisely the structure of a graph. Psychologists have given this another name: a semantic network. It is thought that the myriad of concepts you have committed to memory — Abraham Lincoln, and bar of soap, and my fall schedule, and perhaps millions of others — are all associated in your mind in a vast semantic network that links the related concepts together. When your mind recalls information, or deduces facts, or even drifts randomly in idle moments, it’s essentially traversing this graph along the various edges that exist between vertices.

That’s deep. But you don’t have to go near that deep to see the appearance of graph structures all throughout computer science. What’s MapQuest, if not a giant graph where the vertices are travelable locations and the edges are routes between them? What’s Facebook, if not a giant graph where the vertices are people and the edges are friendships? What’s the World Wide Web, if not a giant graph where the vertices are pages and the edges are hyperlinks? What’s the Internet, if not a giant graph where the vertices are computers or routers and the edges are communication links between them? This simple scheme of linked vertices is powerful enough to accommodate a whole host of applications, which is worth studying.

## Graph Terms

The study of graphs brings with it a whole bevy of new terms which are important to use precisely:

1. **Vertex.** Every graph contains zero or more vertices.¹ (These are also sometimes called nodes, concepts, or objects.)

2. **Edge.** Every graph contains zero or more edges. (These are also sometimes called links, connections, associations, or relationships.) Each edge connects exactly two vertices, unless the edge connects a vertex to itself, which is possible, believe it.

> ¹ The phrase "zero or more" is common in discrete math. In this case, it indicates that the empty graph, which contains no vertices at all, is still a legitimate graph.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 100

Context: # CHAPTER 5. STRUCTURES

## Spatial Positioning

One important thing to understand about graphs is which aspects of a diagram are relevant. Specifically, the spatial positioning of the vertices doesn’t matter. In Figure 5.2 we view Muhammad Ali in the mid-upper left, and Sonny Liston in the extreme upper right. But this was an arbitrary choice, and irrelevant. More specifically, this isn’t part of the information the diagram claims to represent. We could have positioned the vertices differently, as in Figure 5.5, and had the same graph. In both diagrams, there are the same vertices, and the same edges between them (check me). Therefore, these are mathematically the same graph.

```
George Foreman
Sonny Liston
Muhammad Ali
Joe Frazier
```

Figure 5.5: A different look to the same graph as Figure 5.2.

This might not seem surprising for the prize fighter graph, but for graphs like the MapQuest graph, which actually represent physical locations, it can seem jarring. In Figure 5.3 we could have drawn Richmond north of Fredericksburg, and Virginia Beach on the far west side of the diagram, and still had the same graph, provided that all the nodes and links were the same. Just remember that the spatial positioning is designed for human convenience, and isn’t part of the mathematical information. It’s similar to how there’s no order to the elements of a set, even though when we specify a set extensionally, we have to list them in some order to avoid writing all the element names on top of each other. On a graph diagram, we have to draw each vertex somewhere, but where we put it is simply aesthetic.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 101

Context: # 5.1. GRAPHS

## Relationship to Sets

We seem to have strayed far afield from sets with all this graph stuff. But actually, there are some important connections to be made to those original concepts. Recall the wizards set \( A \) from chapter 3 that we extended to contain \{ Harry, Ron, Hermione, Neville \}. Now consider the following endorelation on \( A \):

- (Harry, Ron)
- (Ron, Harry)
- (Ron, Hermione)
- (Ron, Neville)
- (Hermione, Hermione)
- (Neville, Harry)

This relation, and all it contains, is represented faithfully by the graph in Figure 5.6. The elements of \( A \) are the vertices, of course, and each ordered pair of the relation is reflected in an edge of the graph. Can you see how exactly the same information is represented by both forms?

![Figure 5.6: A graph depicting a endorelation.](path_to_image)

Figure 5.6 is a directed graph, of course. What if it were an undirected graph? The answer is that the corresponding relation would be symmetric. An undirected graph implies that if there's an edge between two vertices, it goes "both ways." This is very identical to saying a relation is symmetric: if an \( (x,y) \) is in the relation, then the corresponding \( (y,x) \) must also be. An example is Figure 5.7, which depicts the following symmetric relation:

![Figure 5.7: A graph depicting a symmetric relation.](path_to_image)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 109

Context: # Graphs

## Depth-first traversal (DFT)

1. Choose a starting node.
2. Mark it and push it on an empty stack.
3. While the stack is not empty, do these steps:
   - a) Pop the top node off the stack.
   - b) Visit it.
   - c) Mark and push all of its unmarked adjacent nodes (in any order).

The algorithm in action is shown in Figure 5.11. The stack really made a difference! Instead of alternately exploring Chuck's and Izzy's paths, it bullheadedly darts down Chuck's path as far as it can go, all the way to hitting Izzy's back door. Only then does it back out and visit Izzy. This is because the stack always pops off what it just pushed on, whereas whatever got pushed first has to wait until everything else is done before it gets its chance. That first couple of pushes was critical: if we had pushed Chuck before Izzy at the very beginning, then we would have explored Chuck's entire world before arriving at Chuck's back door, instead of the other way around. As it is, Izzy got put on the bottom, and so she stayed on the bottom, which is inevitable with a stack.

DFT identifies disconnected graphs in the same way as BFT, and it similarly avoids getting stuck in infinite loops when it encounters cycles. The only difference is the order in which it visits the nodes.

## Finding the shortest path

We'll look at two other important algorithms that involve graphs, specifically weighted graphs. The first one is called **Dijkstra's shortest-path algorithm**. This is a procedure for finding the shortest path between two nodes, if one exists. It was invented in 1956 by the legendary computer science pioneer Edsger Dijkstra.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 119

Context: 5.2 Trees
=========

you think that the shortest path between any two nodes would land right on this Prim network? Yet if you compare Figure 5.14 with Figure 5.13 you'll see that the quickest way from Bordeaux to Strasbourg is through Marseille, not Vichy.

So we end up with the remarkable fact that the shortest route between two points has nothing whatsoever to do with the shortest total distance between all points. Who knew?

5.2 Trees
---------

A tree is really nothing but a simplification of a graph. There are two kinds of trees in the world: free trees, and rooted trees.  

### Free trees

A free tree is just a connected graph with no cycles. Every node is reachable from the others, and there’s only one way to get anywhere. Take a look at Figure 5.15. It looks just like a graph (and it is) but unlike the WWII France graph, it’s more skeletal. This is because in some sense, a free tree doesn’t contain anything “extra.”

If you have a free tree, the following interesting facts are true:

1. There’s exactly one path between any two nodes. (Check it!)
2. If you remove any edge, the graph becomes disconnected. (Try it!)
3. If you add any new edge, you end up adding a cycle. (Try it!)
4. If there are n nodes, there are n - 1 edges. (Think about it!)

So basically, if your goal is connecting all the nodes, and you have a free tree, you’re all set. Adding anything is redundant, and taking away anything breaks it.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 123

Context: # 5.2 TREES

parent is A, and A has no parent.

## child

Some nodes have children, which are nodes connected directly below it. A’s children are F and B, C’s are D and E, B’s only child is C, and E has no children.

## sibling

A node with the same parent. E’s sibling is D, B’s is F, and none of the other nodes have siblings.

## ancestor

Your parent, grandparent, great-grandparent, etc., all the way back to the root. B’s only ancestor is A, while E’s ancestors are C, B, and A. Note that F is not C’s ancestor, even though it’s above it on the diagram; there’s no connection from C to F, except back through the root (which doesn’t count).

## descendant

Your children, grandchildren, great-grandchildren, etc., all the way to the leaves. B’s descendants are C, D, and E, while A’s are F, B, C, D, and E.

## leaf

A node with no children. F, D, and E are leaves. Note that in a (very) small tree, the root could itself be a leaf.

## internal node

Any node that’s not a leaf. A, B, and C are the internal nodes in our example.

## depth (of a node)

A node’s depth is the distance (in number of nodes) from it to the root. The root itself has depth zero. In our example, B is of depth 1, E is of depth 3, and A is of depth 0.

## height (of a tree)

A rooted tree’s height is the maximum depth of any of its nodes; i.e., the maximum distance from the root to any node. Our example has a height of 3, since the “deepest” nodes are D and E, each with a depth of 3. A tree with just one node is considered to have a height of 0. Bizarrely, but to be consistent, we’ll say that the empty tree has height -1! Strange, but what else could it be? To say it has height 0 seems inconsistent with a one-node tree also having height 0. At any rate, this won’t come up much.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 135

Context: # 5.2 TREES

![Figure 5.25: NOT a binary search tree, though it looks like one at first glance. (Notice Nancy and Mitch)](path/to/image)

**Possible?** The trick is to realize that with every node you look at, you effectively eliminate half of the remaining tree from consideration. For instance, if we're looking for Molly, we can disregard Mitch's entire left half without even looking at it, then the same for Randi's entire right half. If you discard half of something, then half of the remaining half, it doesn’t take you long before you’ve eliminated almost every false lead.

There’s a formal way to describe this speedup, called **Big-O notation**. The subtleties are a bit complex, but the basic idea is this. When we say that an algorithm is **O(n)** (pronounced "oh-of-n"), it means that the time it takes to execute the algorithm is *proportional* to the number of nodes. This doesn’t imply any specific number of milliseconds or anything — that is highly dependent on the type of computer hardware, the programming language, and a myriad of other things. But what we can say about an **O(n)** algorithm is that if you double the number of nodes, you’re going to approximately double the running time. If you quadruple the number of nodes, you’re going to quadruple the running time. This is what you’d expect.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 144

Context: # CHAPTER 5. STRUCTURES

10. How many vertices and edges are there in the graph below?

   ![Graph](image_reference_placeholder)

   **7 and 10, respectively.**

11. What's the degree of vertex L?  
   **It has an in-degree of 2, and an out-degree of 1.**

12. Is this graph directed?  
   **Yes.**

13. Is this graph connected?  
   **Depends on what we mean. There are two different notions of “connectedness” for directed graphs. One is strongly connected, which means every vertex is reachable from any other by following the arrows in their specified directions. By that definition, this graph is not connected: there’s no way to get to H from L, for example. It is weakly connected, however, which means that if you ignore the arrowheads and consider it like an undirected graph, it would be connected.**

14. Is it a tree?  
   **No. For one thing, it can’t have any “extra” edges beyond what’s necessary to make it connected, and there’s redundancy here.**

15. Is it a DAG?  
   **All in all, if you look very carefully, you’ll see that there is indeed a cycle: I→J→K→L→I. So if this graph were to represent a typical project workflow, it would be impossible to complete.**

16. If we reversed the direction of the I→J edge, would it be a DAG?  
   **Yes. The edges could now be completed in this order: H, G, L, I, M, K, and finally J.**
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 145

Context: # 5.4. EXERCISES

17. If this graph represented an endorelation, how many ordered pairs would it have?  
   **Answer:** 10.

18. Suppose we traversed the graph below in depth-first fashion, starting with node P. In what order would we visit the nodes?

   ![Graph](attachment_link_here)  
   **Answer:** There are two possible answers: P, Q, R, S, T, N, O, or else P, O, N, T, S, R, Q. (The choice just depends on whether we go "left" or "right" initially.) Note in particular that either O or Q is at the very end of the list.

19. Now we traverse the same graph breadth-first fashion, starting with node P. Now in what order would we visit the nodes?

   **Answer:** Again, two possible answers: P, O, Q, N, R, T, S, or else P, Q, R, N, S, T. Note in particular that both O and Q are visited very early.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 149

Context: # Chapter 6

## Counting

If the title of this chapter seems less than inspiring, it’s only because the kind of counting we learned as children was mostly of a straightforward kind. In this chapter, we’re going to learn to answer some more difficult questions like “how many different semester schedules could a college student possibly have?” and “how many different passwords can a customer choose for this e-commerce website?” and “how likely is this network buffer to overflow, given that its packets are addressed to three different destinations?”

The more impressive-sounding name for this topic is **combinatorics**. In combinatorics, we focus on two tasks: counting things (to find out how many there are), and enumerating things (to systematically list them as individuals). Some things turn out to be hard to count but easy to enumerate, and vice versa.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 150

Context: # 6.1 The Fundamental Theorem

We start with a basic rule that goes by the audacious name of **The Fundamental Theorem of Counting**. It goes like this:

If a whole can be divided into \( k \) parts, and there’s \( n_i \) choices for the \( i \)-th part, then there’s \( n_1 \times n_2 \times n_3 \times \cdots \times n_k \) ways of doing the whole thing.

## Example

Jane is ordering a new Lamborghini. She has twelve different paint colors to choose from (including Luscious Red and Sassy Yellow), three different interiors (Premium Leather, Bonded Leather, or Vinyl), and three different stereo systems. She must also choose between automatic and manual transmission, and she can get power locks & windows (or not). How many different configurations does Jane have to choose from? 

To put it another way, how many different kinds of cars could come off the line for her?

The key is that every one of her choices is independent of all the others. Choosing an Envious Green exterior doesn't constrain her choice of transmission, stereo, or anything else. So no matter which of the 12 paint colors she chooses, she can independently choose any of the three interiors, and no matter what these first two choices are, she can freely choose any of the stereos, etc. It’s a mix-and-match. Therefore, the answer is:

\[ 
12 \times 3 \times 3 \times 2 = 432 
\]

Here’s an alternate notation you’ll run into for this, by the way:

**How many other "Fundamental Theorems" of math do you know?** Here are a few: the Fundamental Theorem of Arithmetic says that any natural number can be broken down into its prime factors in only one way. The Fundamental Theorem of Algebra says that the highest power of a polynomial is how many roots (zeros) it has. The Fundamental Theorem of Linear Algebra says that the row space and the column space of a matrix have the same dimension. The Fundamental Theorem of Calculus says that integration and differentiation are the inverses of each other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 151

Context: # 6.1. THE FUNDAMENTAL THEOREM

which is just a shorter way of writing

\[ n_1 \times n_2 \times n_3 \cdots \times n_k. \]

As mentioned in section 4.5, the \( \Sigma \) notation is essentially a loop with a counter, and it says to add up the expression to the right of it for each value of the counter. The \( \Pi \) notation is exactly the same, only instead of adding the expressions together for each value of the counter, we're multiplying them. (The reason mathematicians chose the symbols \( \Sigma \) (sigma) and \( \Pi \) (pi) for this, by the way, is that “sigma” and “pi” start with the same letter as “sum” and “product,” respectively.)

We can actually get a lot of leverage just with the fundamental theorem. How many different PINs are possible for an ATM card? There are four digits, each of which can be any value from 0 to 9 (ten total values), so the answer is:

\[ 10 \times 10 \times 10 \times 10 = 10,000 \text{ different PINs.} \]

So a thief at an ATM machine frantically entering PINs at random (hoping to break your account before you call and stop your debit card) would have to try about 5,000 of them on average before cracking the code.

What about middle school bullies who are trying to break into your locker? Well, most combination locks are opened by a three-number sequence, each number of which is anything from 0 to 39. So there are:

\[ 40 \times 40 \times 40 = 64,000 \text{ different combinations.} \]

That's probably slightly overstated, since I'll bet consecutive repeat numbers are not allowed (Master probably doesn't manufacture a
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 152

Context: # CHAPTER 6. COUNTING

Every car in the state of Virginia must be issued its own license plate number. That's a lot of cars. How many different license plate combinations are available?

This one requires a bit more thought, since not all license numbers have the same number of characters. In addition to `SBE4756` and `PY9127`, you can also have `DAM6` or `LUV6` or even `U2`. How can we incorporate these?

The trick is to divide up our sets into mutually exclusive subsets, and then add up the cardinalities of the subsets. If only 7 characters fit on a license plate, then clearly every license plate number has either 1, 2, 3, 4, 5, 6, or 7 characters. And no license plate has two of these (i.e., there is no plate that is both 5 characters long and 6 characters long). Therefore they’re mutually exclusive subsets, and safe to add. This last point is often not fully appreciated, leading to errors. Be careful not to cavalierly add the cardinalities of non-mutually-exclusive sets! You’ll end up double-counting.

So we know that the number of possible license plates is equal to:

- The # of 7-character plates
- The # of 6-character plates
- The # of 5-character plates
- ...
- The # of 1-character plates.

Very well. Now we can figure out each one separately. How do we know how many 7-character plates there are? Well, if every character must be either a letter or a digit, then we have \(26 + 10 = 36\) choices for each character. This implies \(36^7\) different possible 7-character license plates. The total number of plates is therefore:

\[
36^7 + 36^6 + 36^5 + 36^4 + 36^3 + 36^2 + 36^1 = 60,603,140,212 \text{ plates}
\]

which is about ten times the population of the earth, so I think we're safe for now.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 156

Context: # CHAPTER 6. COUNTING

This pattern comes up so much that mathematicians have established a special notation for it:

\[ n \times (n - 1) \times (n - 2) \cdots \times 1 = n! \; (\text{``n-factorial''}) \]

We say there are “3-factorial” different brushing orders for the Davies kids. For our purposes, the notion of factorial will only apply for integers, so there’s no such thing as 23.46! or π! (In advanced computer science applications, however, mathematicians sometimes do define factorial for non-integers.) We also define 0! to be 1, which might surprise you.

This comes up a heck of a lot. If I give you a jumbled set of letters to unscramble, like “KRBS” (think of the Jumble® word game in the newspaper), how many different unscramblings are there? The answer is 5! or 120, one of which is BRISK. Let's say I shuffle a deck of cards before playing War. How many different games of War are there? The answer is 52!, since any of the cards in the deck might be shuffled on top, then any but that top card could be second, then any of those two could be third, etc. Ten packets arrive near-simultaneously at a network router. How many ways can they be queued up for transmission? 10! ways, just like a larger Davies family.

The factorial function grows really, really fast, by the way, even faster than exponential functions. A five letter word like “BRISK” has 120 permutations, but “AMBIENTDROUSLY” has 87,178,291,200, ten times the population of the earth. The number of ways to shuffle a deck is 

\[ 52! \approx 8.065 \times 10^{67} \]

so I don’t think my boys will end up playing the same War game twice any time soon, nor my wife and I the same bridge hand.

---

**“War”** is a mindless card game which involves no strategy or decision-making on the part of the players. Once you shuffle the initial deck, the entire outcome of the game is fixed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 161

Context: # 6.2. PERMUTATIONS

This is pronounced “n-to-the-k-falling,” and was invented by one of the most brilliant computer scientists in history, Donald Knuth.

To keep straight what \( n^k \) means, think of it as the same as plain exponentiation, except that the product diminishes instead of staying the same. For example, “17-to-the-6th” is

\[
17^6 = 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17 \cdot 17
\]

but “17-to-the-6th-falling” is

\[
17^4 = 17 \cdot 16 \cdot 15 \cdot 14 \cdot 13 \cdot 12.
\]

In both cases, you’re multiplying the same number of terms; it’s just that in the second case, these terms are “falling.”

Anyway, notation aside, partial permutations abound in practice. A late night movie channel might show four classic films back to back every evening. If there are 500 films in the studio’s library, how many nightly TV schedules are possible? Answer: \( 500^4 \) since there are 500 choices of what to show at 7 pm, then 499 choices for 9 pm, 498 for 11 pm, and 497 for the 1 am late show.

The fastest 41 auto racers will qualify for Sunday’s race, and will be placed from Pole Position on down depending on their qualifying time. If 60 cars participate in the qualifying heat, then there are \( 60! \) different possible starting configurations for Sunday.

Middle schoolers entering sixth grade will be assigned a semester schedule that consists of five “blocks” (periods), each of which will have one of thirteen classes (science, math, orchestra, study hall, etc.). How many schedules are possible? You guessed it, \( 13^5 \). Notice that this is the correct answer only because no repeats are allowed: we don’t want to schedule any student for American History more than once. If a student could take the same class more than once in a day, then there would be \( 13^5 \) (not “falling”) different possible schedules.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 165

Context: ```
# 6.3. COMBINATIONS

The coefficients for this binomial are of course 1 and 1, since "n" really means "1 . n." Now if we multiply this by itself, we get:

$ (x + y) \cdot (x + y) = 2x^2 + 2xy + y^2 $,

the coefficients of the terms being 1, 2, and 1. We do it again:

$ (x^2 + 2xy + y^2) \cdot (x + y) = x^3 + 3x^2y + 3xy^2 $ 

to get 1, 3, 3, and 1, and do it again:

$ (x^3 + 3x^2y + 3xy^2 + y^3) \cdot (x + y) = x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + y^4 $ 

to get 1, 4, 6, and 4. At this point you might be having flashbacks to Pascal's triangle, which perhaps you learned about in grade school, in which each entry in a row is the sum of the two entries immediately above it (to the left and right), as in Figure 6.1. (If you never learned that, don't worry about it.)

```
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
```
**Figure 6.1:** The first six rows of Pascal's triangle.

Now you might be wondering where I'm going with this. What do fun algebra tricks have to do with counting combinations of items? The answer is that the values of $ \binom{n}{k} $ are precisely the coefficients of these multiplied polynomials. Let n be 4, which corresponds to the last polynomial we multiplied out. We can then compute all the combinations of items taken from a group of four:

- $ \binom{4}{0} = 1 $ 
- $ \binom{4}{1} = 4 $
- $ \binom{4}{2} = 6 $ 
- $ \binom{4}{3} = 4 $ 
- $ \binom{4}{4} = 1 $ 

In other words, there is exactly **one** way of taking no items out of 4 (you simply don't take any). There are **four** ways of taking one out of 4.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 167

Context: # 6.4 Summary

Most of the time, counting problems all boil down to a variation of one of the following three basic situations:

- \( n^k \) — this is when we have \( k \) different things, each of which is free to take on one of \( n \) completely independent choices.

- \( \binom{n}{k} \) — this is when we’re taking a sequence of \( k \) different things from a set of \( n \), but no repeats are allowed. (A special case of this is \( n! \), when \( k = n \).)

- \( \binom{n}{k} \) — this is when we’re taking \( k \) different things from a set of \( n \), but the order doesn’t matter.

Sometimes it’s tricky to deduce exactly which of these three situations apply. You have to think carefully about the problem and ask yourself whether repeated values would be allowed and whether it matters what order the values appear in. This is often subtle.

As an example, suppose my friend and I work out at the same gym. This gym has 18 different weight machines to choose from, each of which exercises a different muscle group. Each morning,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 170

Context: # CHAPTER 6. COUNTING

1. Go back to when the child did have to choose something from each category, but now say they can choose any number of accessories (so they could have the wizard's cape, a batman helmet, plus a lightsaber, pipe, and scepter). Now how many costumes are there?

   This is \(4 \times 5 \times 2^n\), or a whopping 10,240 for those of you keeping score. The \(2^n\) changed to a \(2^n\) because now for each accessory, a costume might include it, or exclude it. That's two independent choices for each accessory.

2. Okay, that’s overkill. A kid only has two hands, after all, so handling nine accessories would be a disastrous challenge. Let’s say instead that a child can choose up to three accessories (but must have at least one). Now how many costume choices are there?

   Now it’s \(4 \times 5 \times \left( \binom{9}{1} + \binom{9}{2} + \binom{9}{3} \right)\), which is equal to \(4 \times 5 \times (9 + 36 + 84)\) or 2,580 possible costumes.

3. Ignoring the at-least-one-child-and-adult constraint for the moment, the total number of groups would seem to be \( \binom{7}{3} + \binom{7}{2} = 0 + 2380 + 6188 = 9,238 possible groups. But of course this is an overcount, since it includes groups with no children and groups with no adults. We’ll use the trick from p. 146 to subtract those out. Now how many size-3-to-5 groups with no adults (all kids) are there? \( \binom{3}{1} + \binom{3}{2} + \binom{3}{3} = 41\). Therefore, by the 14 trick, the total number of legal groups is 9248 - 957 - 41 = 8,250. Final answer.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 175

Context: # 7.1. WHAT IS A “NUMBER?”

When you think of a number, I want you to try to erase the sequence of digits from your mind. Think of a number as what is: a **quantity**. Here's what the number seventeen really looks like:

```
   8
 8 8 8
   8
```

It’s just an **amount**. There are more circles in that picture than in some pictures, and less than in others. But in no way is it “two digits,” nor do the particular digits “1” and “7” come into play any more or less than any other digits.

Let’s keep thinking about this. Consider this number, which I’ll label “A”:

(A)

Now let’s add another circle to it, creating a different number I’ll call “B”:

(B)

And finally, we’ll do it one more time to get “C”:

(C)

(Look carefully at those images and convince yourself that I added one circle each time.)

When going from A to B, I added one circle. When going from B to C, I also added one circle. Now I ask you: was going from B to C any more “significant” than going from A to B? Did anything qualitatively different happen?

The answer is obviously no. Adding a circle is adding a circle; there’s nothing more to it than that. But if you had been writing
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 188

Context: # CHAPTER 7. NUMBERS

10001

1. (Step 2) Divide 3 by 2 and take the floor, which gives \(\lfloor 3/2 \rfloor = 1\). This still isn't zero, so make 1 our new value, move our pencil to the left of the 0, and go back to step 1.

2. (Step 1) Compute 1 mod 2. Since 1 is odd, this is one, which we write down:

   110001

3. (Step 2) Divide 1 by 2 and take the floor, which gives \(\lfloor 1/2 \rfloor = 0\). We're done. The final answer is 110001₂. Double-checking our work, we verify that indeed 32 plus 16 plus 1 gives 49, which is what we started with.

## Converting to and from hex

That was pretty tedious. But converting back and forth from binary to hex is a snap. That’s because 16 is \(2^4\), and so one hex digit is exactly equal to four binary digits. This isn’t the case with base 10, where one decimal digit is equal to three binary digits… plus a little extra. This “not quite a whole number of digits” thing is what makes converting from decimal to binary (or decimal to hex, for that matter) so awkward.

We most commonly deal with sets of eight bits at a time, which is called a byte. (This is the fundamental unit of storage on pretty much every computer on earth.) Suppose I had the following byte:

10000010₂

Because one hex digit is exactly equal to four bits, this byte is exactly equal to:

86₁₆
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 197

Context: 7.4. BINARY (BASE 2)
=====================

will contain one more digit than the original numbers did. You’ve probably seen this on a hand calculator when you press “=” and get an “E” (for “error”) in the display. If there are only ten digits on your display, adding two ten-digit numbers will (sometimes) result in an eleven-digit number that your calculator can’t display, and it’s alerting you to that fact so you don’t misinterpret the result. Here, we might add two 8-bit quantities and end up with a 9-bit quantity that can’t fit in one byte. This situation is called overflow, and we need to detect when it occurs.

The rules for detecting overflow are different depending on the scheme. For unsigned numbers, the rule is simple: if a 1 is carried out from the MSB (far left-side), then we have overflow. So if I were to try to add \( 155_{10} \) and \( 108_{10} \):

```
   1111
+ 10011011  ←  155_{10}
+ 01101100  ←  108_{10}
-----------
  100000111
```

then I get a carry out left into the 9th digit. Since we can only hold eight digits in our result, we would get a nonsensical answer (15_{10}), which we can detect as bogus because the carry out indicated overflow.

Sign-magnitude works the same way, except that I have one fewer bit when I’m adding and storing results. (Instead of a byte’s worth of bits representing magnitude, the left-end bit has been reserved for a special purpose: indicating the number’s sign.) Therefore, if I add the remaining 7-bit quantities and get a carry out left into the eighth digit, that would indicate overflow.

Now with two’s-complement, things are (predictably) not that easy. But in this case, they’re almost as easy. There’s still a simple rule to detect overflow; it’s just a different rule. The rule is: if the carry in to the last (left-most) bit is different than the carry out from the last bit, then we have overflow.

Let’s try adding \( 103_{10} \) and \( 95_{10} \) in two’s-complement, two numbers:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 201

Context: # 7.5 EXERCISES

6. If I told you that 98,243,917,215 mod 7 was equal to 1, would you call me a liar without even having to think too hard?  
   No, you shouldn’t. It turns out that the answer is 3, not 1, but how would you know that without working hard for it?  

7. If I told you that 273,111,999,214 mod 6 was equal to 6, would you call me a liar without even having to think too hard?  
   Yes, you should. Any number mod 6 will be in the range 0 through 5, never 6 or above. (Think in terms of repeatedly taking out groups of six from the big number. The mod is the number of stones you have left when there are no more whole groups of six to take. If towards the end of this process there are six stones left, that’s not a remainder because you can get another whole group!)

8. Are the numbers 18 and 25 equal?  
   Of course not. Don’t waste my time.

9. Are the numbers 18 and 25 congruent mod 7?  
   Yes. If we take groups of 7 out of 18 stones, we’ll get two such groups (a total of 14 stones) and have 4 left over. And then, if we do that same with 25 stones, we’ll get three such groups (a total of 21 stones) and again have 4 left over. So they’re not congruent mod 7.

10. Are the numbers 18 and 25 congruent mod 6?  
    No. If we take groups of 6 out of 18 stones, we’ll get three such groups with nothing left over. But if we start with 25 stones, we’ll take out 4 such groups (for a total of 24 stones) and have one left over. So they’re not congruent mod 6.

11. Are the numbers 617,418 and 617,424 equal?  
    Of course not. Don’t waste my time.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 205

Context: # Chapter 8
## Logic

To a great extent, logic governs the way your mind works, even among so-called “irrational people.” If we want to capture logical processes and represent them in a computer program, we need a way to express these thoughts in a form suitable for automated reasoning. This is primarily why computer scientists study logic.

Interestingly, the material in this chapter covers the very bottom and the very top of the technology stack. At the bottom, we have actual physical hardware that consists of circuits turning bits on and off. The rules that govern when we want to turn which bits on and off are based on “logic gates,” or tiny physical devices that implement the logical principles of this chapter on a micro scale. At the other end of the spectrum, we have highly abstract programs aiming towards “artificial intelligence.” These systems are centered around a “knowledge base” of accumulated facts, and regularly examine those known facts to make decisions and draw additional conclusions. What does a knowledge base consist of? You guessed it: logical statements that are described in this chapter.

### 8.1 Propositional logic

The simpler — but less powerful — of the two logic systems we’ll study is called **propositional logic**. It has this name because the core building block is the **proposition**. A proposition is simply a
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 207

Context: # 8.1. PROPOSITIONAL LOGIC

- \( \land \) — "and"
- \( \neg \) — "not"
- \( \lor \) — "or"
- \( \Rightarrow \) — "implies" (or "if...then...")
- \( \oplus \) — "xor" (exclusive "or")
- \( \equiv \) — "equiv" (equivalent)

Just as the ordinary arithmetic operators (\(+\), \(-\), etc.) can be used to join numbers and produce another number, and just as the set operators can be used to join sets and produce another set, the logical operators can be used to join propositions and produce another proposition. 

The expression \( 34 + 59 \) produces the number 93. The expression \( \{X, Y\} \cup \{Z\} \) produces the set \( \{X, Y, Z\} \). And the expression \( A \land B \) produces the value **false**, since although \( UMW \) is located in Virginia, the King is not female.

Let's run through the six operators, some of which are intuitive and some of which are not:

### \( \land \) ("and")
The proposition \( X \land Y \) is true when both \( X \) and \( Y \) are true propositions. "AAC" represents the proposition "UMW is in Virginia and dogs are carnivores," which has a truth value of **true** since both components are true. This operation is sometimes called a **conjunction**. Notice that the "∧" sign somewhat resembles the "∩" sign for set intersection. This is not an accident. An element is in the intersection of two sets if it is a member of the first and the second set. Hence mathematicians have chosen symbols which reinforce this connection.

### \( \lor \) ("or")
The proposition \( X \lor Y \) is true when either \( X \) or \( Y \) (or both) are true propositions. "BVC" represents the proposition "The King of England is female or dogs are carnivores," which has a truth value of **true** since the second component is true. This operation is sometimes called a **disjunction**. The \( \lor \) looks somewhat like the "U" sign for union, since an element is in the union of two sets if it is an element of the first set or the second set (or both). This operator is sometimes called an **inclusive or** since it is true if both propositions are true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 216

Context: # CHAPTER 8. LOGIC

| X | Y | X ∧ Y | ¬(X ∧ Y) | ¬X | ¬Y | X ∨ Y | ¬(X ∨ Y) | (X ∨ Y) ↔ ¬(X ∧ Y) |
|---|---|-------|-----------|----|----|-------|-----------|-----------------------|
| 0 | 0 |  0    |     1     |  1 |  1 |  0    |     1     |          1            |
| 0 | 1 |  0    |     1     |  1 |  0 |  1    |     0     |          1            |
| 1 | 0 |  0    |     1     |  0 |  1 |  1    |     0     |          1            |
| 1 | 1 |  1    |     0     |  0 |  0 |  1    |     0     |          1            |

This is one of De Morgan's Laws, which we've seen previously with regards to sets (p. 21). Here is the other:

| X | Y | X ∧ Y | ¬(X ∧ Y) | ¬X | ¬Y | X ∨ Y | ¬(X ∨ Y) | (X ∨ Y) ↔ ¬(X ∧ Y) |
|---|---|-------|-----------|----|----|-------|-----------|-----------------------|
| 0 | 0 |  0    |     1     |  1 |  1 |  0    |     1     |          1            |
| 0 | 1 |  0    |     1     |  1 |  0 |  1    |     0     |          1            |
| 1 | 0 |  0    |     1     |  0 |  1 |  1    |     0     |          1            |
| 1 | 1 |  1    |     0     |  0 |  0 |  1    |     0     |          1            |

The first can be expressed as “the negation of the disjunction is equal to the conjunction of the negations,” and the second as “the negation of the conjunction is equal to the disjunction of the negations.” If that helps at all.

One last identity is this one:

| X | Y | Z | X ∧ Y ∨ Z | X ∧ (Y ∨ Z) | X ∧ Z | (X ∧ Y) ∨ (X ∧ Z) | A* |
|---|---|---|-----------|-------------|-------|--------------------|----|
| 0 | 0 | 0 |     0     |      0      |   0   |         0          |  1 |
| 0 | 0 | 1 |     1     |      0      |   0   |         0          |  1 |
| 0 | 1 | 0 |     0     |      0      |   0   |         0          |  1 |
| 0 | 1 | 1 |     1     |      0      |   0   |         0          |  1 |
| 1 | 0 | 0 |     1     |      0      |   0   |         0          |  1 |
| 1 | 0 | 1 |     1     |      1      |   1   |         1          |  1 |
| 1 | 1 | 0 |     1     |      1      |   0   |         1          |  1 |
| 1 | 1 | 1 |     1     |      1      |   1   |         1          |  1 |

*Here, "A" is X ∧ (Y ∨ Z) ⇔ (X ∧ Y) ∨ (X ∧ Z).

This is none other than the distributive law, which we also saw for set union and intersection (p. 20) and which you should also remember from introductory algebra: \( x \cdot (y + z) = x \cdot y + x \cdot z \).

It's interesting, actually, when you compare the distributive law from algebra to the distributive law for logic:

\[ x \land (y \lor z) \equiv (x \land y) \lor (x \land z) \]

\( X \land (Y \lor Z) \equiv (X \land Y) \lor (X \land Z) \)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 217

Context: # 8.2 Predicate logic

The `∧` operator is analogous to `+` (times), while `∨` corresponds to `+` (plus). In fact, if you look at the truth tables for these two operators again, you’ll see an uncanny resemblance:

| X | Y | X ∧ Y | X ∨ Y |
|---|---|-------|-------|
| 0 | 0 |   0   |   0   |
| 0 | 1 |   0   |   1   |
| 1 | 0 |   0   |   1   |
| 1 | 1 |   1   |   1   |

Except for the `(1)` that I put in parentheses, this truth table is exactly what you’d get if you mathematically multiplied (`∧`) and added (`∨`) the inputs! At some level, logically “and-ing” is multiplying, while “or-ing” is adding. Fascinating.

## 8.2 Predicate logic

Propositional logic can represent a lot of things, but it turns out to be too limiting to be practically useful. That has to do with the atomic nature of propositions. Every proposition is its own opaque chunk of truthhood or falsity, with no way to break it down into constituent parts. 

Suppose I wanted to claim that every state in the union had a governor. To state this in propositional logic, I’d have to create a brand new proposition for each state:

- Let G1 be the proposition that Alabama has a governor.
- Let G2 be the proposition that Alaska has a governor.
- Let G3 be the proposition that Arizona has a governor.
- ...

and then, finally, I could assert:

G1 ∧ G2 ∧ G3 ∧ ... ∧ G50.

That’s a lot of work just to create a whole bunch of individual propositions that are essentially the same. What we need is some
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 219

Context: # 8.2. Predicate Logic

which is perfectly true[^1].

You may recall the word "predicate" from your middle school grammar class. Every sentence, remember, has a subject and a predicate. In “Billy jumps,” “Billy” is the subject, and “jumps” the predicate. In “The lonely boy ate spaghetti with gusto,” we have “the lonely boy” as the subject and “ate spaghetti with gusto” as the predicate. Basically, a predicate is anything that can describe or affirm something about a subject. Imagine asserting “JUMPS(Billy)” and “ATESPAGHETTIWITHGUSTO(lonely boy).”

A predicate can have more than one input. Suppose we define the predicate `IsFanOf` as follows:

Let `IsFanOf(x, y)` be the proposition that x digs the music of rock band y.

Then I can assert:

- `IsFanOf(Stephen, Led Zeppelin)`
- `IsFanOf(Rachel, The Beatles)`
- `IsFanOf(Stephen, The Beatles)`
- `¬IsFanOf(Stephen, The Rolling Stones)`

We could even define `TraveledToByModeInYear` with a bunch of inputs:

Let `TraveledToByModeInYear(p, d, m, y)` be the proposition that person p traveled to destination d by mode m in year y.

The following statements are then true:

- `TraveledToByModeInYear(Stephen, Richmond, car, 2017)`

> "By the way, when I say you can give any input at all to a predicate, I mean any individual element from the domain of discourse. I don’t mean that a set of elements can be an input. This limitation is why it’s called ‘first-order’ predicate logic. If you allow sets to be inputs to predicates, it’s called ‘second-order predicate logic’, and can get quite messy." 

[^1]: The footnote text if needed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 222

Context: 214  
# CHAPTER 8. LOGIC

So these two cases both result in true. But perhaps surprisingly, we also get true for oatmeal:

- **Human**(oatmeal) → **Adult**(oatmeal) ⊕ **Child**(oatmeal)  
  false → false ⊕ false  
  false → false  
  true ✓  

Whoa, how did true pop out of that? Simply because the premise was false, and so all bets were off. We effectively said “if a bowl of oatmeal is human, then it will either be an adult or a child. But it’s not, so never mind!” Put another way, the bowl of oatmeal did not turn out to be a counterexample, and so we’re confident claiming that this expression is true “for all \( x \): \( \forall \)”.

The other kind of quantifier is called the **existential quantifier**. As its name suggests, it asserts the existence of something. We write it \( \exists \) and pronounce it “there exists.” For example,

- \( \exists \, \text{HasGovernor}(x) \) asserts that there is at least one state that has a governor. This doesn’t tell us how many states this is true for, and in fact despite their name, quantifiers really aren’t very good at “quantifying” things for us, at least numerically. As of 2008, the statement

- \( \text{President}(x) \land \text{African-American}(x) \)

is true, and always will be, no matter how many more African-American U.S. presidents we have. Note that in compound expressions like this, a variable (like \( x \)) always stands for a single entity wherever it appears. For hundreds of years there existed African-Americans, and there have existed Presidents, so the expression above would be ridiculously obvious if it meant only “there have been Presidents, and there have been African-Americans.” But the same variable \( x \) being used as inputs to both predicates is what seals the deal and makes it represent the much stronger statement: “there is at least one individual who is personally both African-American and President of the United States at the same time.”
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 224

Context: # CHAPTER 8. LOGIC

## Order matters

When you're facing an intimidating morass of \( \forall \)s and \( \exists \)s and \( \vee \)s and \( \wedge \)s, and God knows what else, it's easy to get lost in the sauce. But you have to be very careful to dissect the expression to find out what it means. Consider this one:

\[
\forall x \in \mathbb{R}, \exists y \in \mathbb{R} \quad y = x + 1.
\] (8.5)

This statement is true. It says that for every single real number (call it \( x \)), it's true that you can find some other number (call it \( y \)) that's one greater than it. If you generate some examples it's easy to see this is true. Suppose we have the real number \( x = 5 \). Is there some other number \( y \) that's equal to \( 2 + x \)? Of course, the number 6. What if \( x = -32.4 \)? Is there a number \( y \) that satisfies this equation? Of course, \( y = -31.4 \). Obviously not matter what number \( x \) we choose, we can find the desired number \( y \) just by adding one. Hence this statement is true for all \( x \), just like it says.

What happens, though, if we innocently switch the order of the quantifiers? Let's try asserting this:

\[
\exists y \in \mathbb{R}, \forall x \in \mathbb{R} \quad y = x + 1.
\] (8.6)

Is this also true? Look carefully. It says “there exists some magic number \( y \) that has the following amazing property: no matter what value of \( x \) you choose, this \( y \) is one greater than \( x \)!” Obviously this is not true. There is no such number \( y \). If I choose \( y = 13 \), that works great as long as I choose \( x = 12 \), but for any other choice of \( x \), it’s dead in the water.

## The value of precision

This fluency with the basic syntax and meaning of predicate logic was our only goal in this chapter. There are all kinds of logical rules that can be applied to predicate logic statements in order to
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 226

Context: # Chapter 8: Logic

Here's a funny one I’ll end with. Consider the sentence **“He made her duck.”** What is intended here? Did some guy reach out with his hand and forcefully push a woman’s head down out of the way of a screaming projectile? Or did he prepare a succulent dish of roasted fowl to celebrate her birthday? Oh, if the computer could only know. If we’d used predicate logic instead of English, it could!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 230

Context: # CHAPTER 8. LOGIC

|   | Statement                                                                                              |
|---|-------------------------------------------------------------------------------------------------------|
| 35| True or false: ∀x Professor(x) ⇔ Human(x). <br> **True!** This is what we were trying to say all along. Every professor is a person. |
| 36| True or false: ¬∃x Professor(x) → ¬Human(x). <br> **True!** This is an equivalent statement to item 35. There's nothing in the universe that is a professor yet not a human. (At least, at the time of this writing.) |
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 231

Context: # Chapter 9

## Proof

We've seen a lot of pretty sights on our cool brisk walk. We've caught a glimpse of the simple elegance of sets and relations, the precision of probabilistic reasoning, the recursive structure of trees, the explosive nature of combinatorics, and much more. None of these things have we plumbed to the depths, but we've appreciated their beauty and taken note of where they stood along our blazed trail. You’ll remember this hike when you run into such concepts again and again in future computer science and math courses, and in your career beyond academics.

Now we have one more step to make before returning to the trailhead, and that deals with the notion of **proof**. As we've studied these various mathematical entities, I’ve pointed out certain of their properties. A free tree has one more vertex than edge, for example. The cardinality of the union of two sets is at least as big as each of their individual unions. If you flip-all-the-bits-and-add-one in a two's complement scheme, and then perform that flip-and-add operation again, you’ll return to the original number. But with a few exceptions, we haven't proven any of these things. I’ve just stated them, and you’ve taken them on faith.

In order to establish reliable truth, of course, professional mathematicians aren’t satisfied with unsubstantiated statements. They need to be convinced that the claims we make do truly hold, and provably so, in all circumstances. What they seek is a **proof** of a...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 233

Context: # 9.1. PROOF CONCEPTS

The phone is in my pocket, and has not rung, and I conclude that the plan has not changed. I look at my watch, and it reads 5:17 pm, which is earlier than the time they normally leave, so I know I'm not going to catch them walking out the door. This is, roughly speaking, my thought process that justifies the conclusion that the house will be empty when I pull into the garage.

Notice, however, that this prediction depends precariously on several facts. What if I spaced out the day of the week, and this is actually Thursday? All bets are off. What if my cell phone battery has run out of charge? Then perhaps they did try to call me but couldn’t reach me. What if I set my watch wrong and it’s actually 4:17 pm? Etc. Just like a chain is only as strong as its weakest link, a whole proof falls apart if even one step isn’t reliable.

Knowledge bases in artificial intelligence systems are designed to support these chains of reasoning. They contain statements expressed in formal logic that can be examined to deduce only the new facts that logically follow from the old. Suppose, for instance, that we had a knowledge base that currently contained the following facts:

1. \( A = C \)
2. \( \neg (C \land D) \)
3. \( (F \lor E) \to D \)
4. \( A \lor B \)

These facts are stated in propositional logic, and we have no idea what any of the propositions really mean, but then neither does the computer, so hey. Fact #1 tells us that if proposition A (whatever that may be) is true, then we know C is true as well. Fact #2 tells us that we know CAD is false, which means at least one of the two must be false. And so on. Large knowledge bases can contain thousands or even millions of such expressions. It’s a complete record of everything the system “knows.”

Now suppose we learn an additional fact: \( \neg B \). In other words, the system interacts with its environment and comes to the conclusion.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 235

Context: ```
9.2 TYPES OF PROOF
==================

to be true, and so it is legal grounds from which to start. A proof can't even get off the ground without axioms. For instance, in step 1 of the above proof, we noted that either A or B must be true, and so if B isn't true, then A must be. But we couldn't have taken this step without knowing that disjunctive syllogism is a valid form of reasoning. It's not important to know all the technical names of the rules that I included in parentheses. But it is important to see that we made use of an axiom of reasoning on every step, and that if any of those axioms were incorrect, it could lead to a faulty conclusion.

When you create a valid proof, the result is a new bit of knowledge called a theorem which can be used in future proofs. Think of a theorem like a subroutine in programming: a separate bit of code that does a job and can be invoked at will in the course of doing other things. One theorem we learned in chapter 2 was the distributive property of sets; that is, \( X \cap (Y \cup Z) = (X \cap Y) \cup (X \cap Z) \). This can be proven through the use of Venn diagrams, but once you've proven it, it's accepted to be true, and can be used as a "given" in future proofs.

## 9.2 Types of Proof

There are a number of accepted "styles" of doing proofs. Here are some important ones:

### Direct Proof

The examples we've used up to now have been direct proofs. This is where you start from what's known and proceed directly by positive steps towards your conclusion.

Direct proofs remind me of a game called "word ladders," invented by Lewis Carroll, that you might have played as a child:

```
WARM
  |
  |
? ? ? ?
  |
  |
COLD
```
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 238

Context: 230  
# CHAPTER 9. PROOF  

An irrational number is one that **cannot** be expressed as the ratio of two integers, no matter what the integers are.  

Proving this directly seems pretty hard, since how do you prove that there aren't any two integers whose ratio is \(\sqrt{2}\), no matter how hard you looked? I mean, \(534,927\) and \(378,250\) are pretty dang close:  

\[
\left( \frac{534927}{378250} \right)^2 = 2.000005.
\]

How could we possibly prove that no matter how hard we look, we can never find a pair that will give it to us exactly?  

One way is to assume that \(\sqrt{2}\) is a rational number, and then prove that down that path lies madness. It goes like this: Suppose \(\sqrt{2}\) is rational, after all. That means that there must be two integers, call them \(a\) and \(b\), whose ratio is exactly equal to \(\sqrt{2}\):  

\[
\frac{a}{b} = \sqrt{2}.
\]

This, then, is the starting point for our indirect proof. We're going to proceed under this assumption and see where it leads us.  

By the way, it's clear that we could always reduce this fraction to lowest terms in case it's not already. For instance, if \(a = 6\) and \(b = 4\), then our fraction would be \(\frac{3}{2}\), which is the same as \(\frac{6}{4}\), so we could just say \(a = 3\) and \(b = 2\) and start over. Bottom line: if \(\sqrt{2}\) is rational, then we can find two integers \(a\) and \(b\) that have no common factor (if they do have a common factor, we'll just divide it out of both of them and go with the new numbers) whose ratio is \(\sqrt{2}\).  

Okay then. But now look what happens. Suppose we square both:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 240

Context: 232  
# CHAPTER 9. PROOF  

“recursion,” because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.116). A tree can be thought of as a node with several children—each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.16 on p.113, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs.

Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron.

## Casting the problem in the right form  

Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.)

Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let VOTE(n) be the proposition that a citizen of age n can vote.”

If I want to prove an algebraic identity, like  

\[
\sum_{i=1}^{n} \frac{i}{n+1} = \frac{n(n+1)}{2}, 
\]  

then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \( n \) variable in my equation.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 244

Context: # Example 2

A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher's astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \( 50 \times 101 = 5050 \). In general, if you add the numbers from 1 to \( x \), where \( x \) is any integer at all, you'll get \( \frac{x(x + 1)}{2} \).

Now, use mathematical induction to prove that Gauss was right (i.e., that \( \sum_{i=1}^{x} i = \frac{x(x + 1)}{2} \)) for all numbers \( x \).

First, we have to cast our problem as a predicate about natural numbers. This is easy: say “let \( P(n) \) be the proposition that 

\[
\sum_{i=1}^{n} i = \frac{n(n + 1)}{2} 
\]

Then, we satisfy the requirements of induction:

1. **Base Case**: We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have:

   \[
   \sum_{i=1}^{1} i = \frac{1(1 + 1)}{2}
   \]
   \[
   1 = \frac{1(2)}{2}
   \]
   \[
   1 = 1 \quad \text{✓}
   \]

2. **Inductive Step**: We now must prove that \( P(k) \) implies \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 251

Context: # 9.3. PROOF BY INDUCTION

## 2. Inductive Step
We now must prove that \((\forall i \leq k) P(i) \Rightarrow P(k + 1)\). Put another way, we assume that all trees smaller than the one we’re looking at have one more node than edge, and then use that assumption to prove that the tree we’re looking at also has one more node than edge.

We proceed as follows. Take any free tree with \(k + 1\) nodes. Removing any edge gives you two free trees, each with \(k\) nodes or less. (Why? Well, if you remove any edge from a free tree, the nodes will no longer be connected, since a free tree is “minimally connected” as it is. And we can't break it into more than two trees by removing a single edge, since the edge connects exactly two nodes and each group of nodes on the other side of the removed edge are still connected to each other.)

Now the sum of the nodes in these two smaller trees is still \(k + 1\). (This is because we haven't removed any nodes from the original free tree — we've simply removed an edge.) If we let \(k_1\) be the number of nodes in the first tree, and \(k_2\) the number of nodes in the second, we have \(k_1 + k_2 = k + 1\).

Okay, but how many edges does the first tree have? Answer: \(k_1 - 1\). How do we know that? By the inductive hypothesis. We’re assuming that any tree smaller than \(k + 1\) nodes has one less edge than node, and so we’re taking advantage of that (legal) assumption here. Similarly, the second tree has \(k_2 - 1\) edges.

Bingo. Removing one edge from our original tree of \(k + 1\) nodes gave us a total of \(k - 1\) edges. Therefore, that original tree must have had \(k\) edges. We have now proven that a tree of \(k + 1\) nodes has \(k\) edges, assuming that all smaller trees also have one less edge than node.

## 3. Conclusion
Therefore, by the strong form of mathematical induction, \(\forall n \geq 1, P(n)\).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 252

Context: # 9.4 Final Word

Finding proofs is an art. In some ways, it’s like programming: you have a set of building blocks, each one defined very precisely, and your goal is to figure out how to assemble those blocks into a structure that starts with only axioms and ends with your conclusion. It takes skill, patience, practice, and sometimes a little bit of luck.

Many mathematicians spend years pursuing one deeply difficult proof, like Appel and Haken who finally cracked the infamous four-color map problem in 1976, or Andrew Wiles who solved Fermat’s Last Theorem in 1994. Some famous mathematical properties may never have proofs, such as Christian Goldbach’s 1742 conjecture that every even integer is the sum of two primes; or the most elusive and important question in computing theory: does P=NP? (Put very simply: if you consider the class of problems where it’s easy to verify a solution once you have it, but crazy hard to find it in the first place, is there actually an easy algorithm for finding the solution that we just haven’t figured out yet?) Most computer scientists think “no,” but despite a mind-boggling number of hours invested by the brightest minds in the world, no one has ever been able to prove it one way or the other.

Most practicing computer scientists spend time taking advantage of the known results about mathematical objects and structures, and rarely (if ever) have to construct a watertight proof about them. For the more theoretically-minded student, however, who enjoys probing the basis behind the tools and speculating about additional properties that might exist, devising proofs is an essential skill that can also be very rewarding.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 253

Context: # Index

- n-choose-k notation, 156
- n-to-the-k-falling operator, 152
- a priori, 68
- modus ponens, 201, 226
- modus tollens, 226
- quad erat demonstrandum (Q.E.D.), 226
- reductio ad absurdum, 229
- acyclic (graphs), 91
- additivity property, 63
- adjacent (vertices), 89
- algorithm, 97, 127, 128, 132, 149, 150
- Ali, Muhammad, 92
- American Idol, 62, 68
- ancestor (of a node), 115
- and (logical operator), 18, 199, 203
- antisymmetric (relation), 40, 43
- Appel, Kenneth, 244
- arrays, 13
- artificial intelligence (AI), 197, 201, 225
- associative, 20
- asymmetric (relation), 41
- ATM machines, 143
- atomic (propositions), 198
- AVL trees, 133
- axioms, 226, 229
- background knowledge, 68, 70
- balancedness (of a tree), 132
- base case (of a proof), 233, 240
- bases (of number systems), 166, 168, 170
- Bayes' Theorem, 75
- Bayes, Thomas, 67
- Bayesian, 66
- BFT (breadth-first traversal), 95, 97
- Big-O notation, 127
- bijective (function), 49
- binary numbers, 25, 177, 178, 180, 182
- binary search trees, 123, 125
- binomial coefficients, 156
- bit, 177
- Booth, John Wilkes, 86
- BST property, 125, 131
- byte, 180
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 254

Context: ```
# CHAPTER 9. PROOF

- Cantor, Georg, 7, 12, 17
- capacity (of a byte), 182
- cardinality (of sets), 16, 25, 28, 66
- Carroll, Lewis, 227
- carry-in, 189
- carry-out, 189
- Cartesian product (of sets), 
  - 19, 35
- chess, 114
- child (of a node), 115
- closed interval, 61
- codomain (of a function), 45
- collectively exhaustive, 26
- combinations, 154
- combinators, 25, 141
- commutative, 18, 20, 71
- compilers, 114
- complement laws (of sets), 
  - 21
- complement, partial (of sets), 
  - 18
- complement, total (of sets), 
  - 18, 65, 146, 162
- complete binary tree, 121
- conclusion (of implication), 200
- conditional probability, 
  - 68, 72, 74, 78
- congruent, 173
- conjunction, 199, 208
- connected (vertices/graphs), 
  - 89, 95
- coordinates, 15
- curly brace notation, 11
- cycles, 90

## DAGs (directed acyclic graphs), 90

- data structures, 85
- Davies family, 8, 9, 26, 147, 154
- De Morgan's laws, 21, 22, 207, 208
- decimal numbers, 165, 169, 173, 178
- degree (of a vertex), 90
- depth (of a node), 115
- dequeuing, 95
- descendant (of a node), 115
- DFT (depth-first traversal), 99, 101

## Dijkstra's algorithm, 101, 104
- Dijkstra, Edsger, 101
- direct proof, 227
- directed graphs, 88, 91
- disjunction, 199, 208, 226
- disjunctive syllogism, 226
- disk sectors, 156
- distributive, 20, 208, 227
- domain (of a function), 45
- domain of discourse (?), 9, 
  - 19, 21, 27, 60, 210
- domination laws (of sets), 21
- dominors, 234
- drinking age, 232, 234
- duplicates (in sets), 13
- edges, 86, 87
- elements (of sets), 8, 15, 23
- ellipses, 12
- empty graph, 87
- empty set, 9, 16, 21, 24, 25, 
  - 36, 114
- endorelations, 38, 93
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 257

Context: 9.4. FINAL WORD
=================

- ordered triples, 15
- org charts, 113
- outcomes, 60, 62
- overflow, 188

P = NP?, 244  
parent (of a node), 114  
partial orders, 43  
partial permutations, 151, 154  
partitions, 26, 71, 94  
Pascal's triangle, 157  
passwords, 146  
paths (in a graph), 88, 113  
perfect binary tree, 122, 239  
permutations, 147  
PINs, 143  
poker, 160  
pop (off a stack), 99  
posts, 43  
post-order traversal, 118  
postulates, 226  
power sets, 24, 36  
pre-order traversal, 117  
predicates, 210, 211, 232  
predicate logic, 210  
premise (of implication), 200  
Prim's algorithm, 107  
Prime, Robert, 107  
prior probability, 68  
probability measures, 61, 63, 65  
product operator (II), 142, 160  
proof, 223  
proof by contradiction, 229  
propositional logic, 197, 225  
propositions, 197, 210  

- psychology, 70, 86
- push (on a stack), 99
- quantifiers, 212, 215
- queue, 95, 97
- quotient, 173, 174

- range (of a function), 48
- rational numbers (ℝ), 17, 24
- reachable, 89
- real numbers (ℝ), 71, 94
- rebalancing (a tree), 132
- recursion, 116, 120, 149, 231
- red-black trees, 133
- reflexive (relation), 40, 43
- relations, finite, 39
- relations, infinite, 39
- remainder, 173, 174
- right child, 116  
- root (of a tree), 112, 114  
- rooted trees, 112, 134  
- Russell's paradox, 15  

- sample space (Ω), 60  
- semantic network, 87  
- set operators, 18  
- set-builder notation, 11  
- sets, 8, 93  
- sets of sets, 15  
- sets, finite, 12  
- sets, fuzzy, 10  
- sets, infinite, 12  
- sibling (of a node), 115  
- signs-magnitude binary numbers, 183, 189  

**Sonic the Hedgehog**, 73  
- southern states, 72  
- spatial positioning, 92, 113
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 8

Context: 2. Notation
==============

- The notation log represents the logarithm in base two, while ln represents the natural logarithm.
- \( \mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \) indicates that random vector \(\mathbf{x}\) is distributed according to a multivariate Gaussian distribution with mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\Sigma\). The multivariate Gaussian pdf is denoted as \( \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \Sigma) \) as a function of \(\mathbf{x}\).
- \( \mathbf{x} \sim U[a,b] \) indicates that \(\mathbf{x}\) is distributed according to a uniform distribution in the interval [a, b]. The corresponding uniform pdf is denoted as \( U(a,b) \).
- \( \delta(x) \) denotes the Dirac delta function or the Kronecker delta function, as clear from the context.
- \( \|\mathbf{a}\|_2^2 = \sum_{i=1}^N a_i^2 \) is the quadratic, or \( l_2 \) norm of a vector \(\mathbf{a} = [a_1, \ldots, a_N]^T\). We similarly define the \( l_1 \) norm as \( \|\mathbf{a}\|_1 = \sum_{i=1}^N |a_i| \), and the \( l_0 \) pseudo-norm \( \|\mathbf{a}\|_0 \) as the number of non-zero entries of vector \(\mathbf{a}\).
- \( I \) denotes the identity matrix, whose dimensions will be clear from the context. Similarly, I represents a vector of all ones.
- \( \mathbb{R} \) is the set of real numbers; \( \mathbb{R}^+ \) the set of non-negative real numbers; \( \mathbb{R}^- \) the set of non-positive real numbers; and \( \mathbb{R}^N \) is the set of all vectors of \( N \) real numbers.
- \( |S| \) represents the cardinality of a set \( S \).
- \( \mathbf{Z} \) represents a set of random variables indexed by the integers \( k \in S \).
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 13

Context: # 1.1. What is Machine Learning?

This starts with an in-depth analysis of the problem domain, which culminates with the definition of a mathematical model. The mathematical model is meant to capture the key features of the problem under study and is typically the result of the work of a number of experts. The mathematical model is finally leveraged to derive hand-crafted solutions to the problem.

For instance, consider the problem of defining a chemical process to produce a given molecule. The conventional flow requires chemists to leverage their knowledge of models that predict the outcome of individual chemical reactions, in order to craft a sequence of suitable steps that synthesize the desired molecule. Another example is the design of speech translation or image/video compression algorithms. Both of these tasks involve the definition of models and algorithms by teams of experts, such as linguists, psychologists, and signal processing practitioners, not infrequently during the course of long standardization meetings.

The engineering design flow outlined above may be too costly and inefficient for problems in which faster or less expensive solutions are desirable. The machine learning alternative is to collect large data sets, e.g., of labeled speech, images, or videos, and to use this information to train general-purpose learning machines to carry out the desired task. While the standard engineering flow relies on domain knowledge and on design optimized for the problem at hand, machine learning lets large amounts of data dictate algorithms and solutions. To this end, rather than requiring a precise model of the set-up under study, machine learning requires the specification of an objective, of a model to be trained, and of an optimization technique.

Returning to the first example above, a machine learning approach would proceed by training a general-purpose machine to predict the outcome of known chemical reactions based on a large data set, and then by using the trained algorithm to explore ways to produce more complex molecules. In a similar manner, large data sets of images or videos would be used to train a general-purpose algorithm with the aim of obtaining compressed representations from which the original input can be recovered with some distortion.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 17

Context: # 1.3 Goals and Outline

This monograph considers only passive and offline learning.

## 1.3 Goals and Outline

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. Unlike other texts that are focused on one particular aspect of the field, an effort has been made here to provide a broad but concise overview in which the main ideas and techniques are systematically presented. Specifically, the material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. A prior exposure to information theory is useful but not required.

Detailed discussions are provided on basic concepts and ideas, including overfitting and generalization, Maximum Likelihood and regularization, and Bayesian inference. The text also endeavors to provide intuitive explanations and pointers to advanced topics and research directions. Sections and subsections containing more advanced material that may be regarded at a first reading are marked with a star (+).

The reader will find here further discussions on computing platform or programming frameworks, such as map-reduce, nor details on specific applications involving large data sets. These can be easily found in a vast and growing body of work. Furthermore, rather than providing exhaustive details on the existing myriad solutions in each specific category, techniques have been selected that are useful to illustrate the most salient aspects. Historical notes have also been provided only for a few selected milestone events.

Finally, the monograph attempts to strike a balance between the algorithmic and theoretical viewpoints. In particular, all learning algorithms are presented in a manner that emphasizes their theoretical foundations while also providing practical insights.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 22

Context: ```markdown
variables \( t_n \) are assumed to be dependent on \( x_n \), and are referred to as dependent variables, labels, or responses. An example is illustrated in Fig. 2.1. We use the notation \( \mathbf{x}_D = (x_1, \ldots, x_N)^T \) for the covariates and \( \mathbf{t}_D = (t_1, \ldots, t_N)^T \) for the labels in the training set \( D \). Based on this data, the goal of supervised learning is to identify an algorithm to predict the label \( t \) for a new, that is, as of yet unobserved, domain point \( x \).

![Figure 2.1: Example of a training set \( D \) with \( N = 10 \) points \( (x_n, t_n) \), \( n = 1, \ldots, N \).](path_to_image)

The outlined learning task is clearly impossible in the absence of additional information on the mechanism relating variables \( z \) and \( t \). With reference to Fig. 2.1, unless we assume, say, that \( z \) and \( t \) are related by a function \( t = f(z) \) with some properties, such as smoothness, we have no way of predicting the label \( t \) for an unobserved domain point \( z \). This observation is formalized by the no free lunch theorem to be reviewed in Chapter 5: one cannot learn rules that generalize to unseen examples without making assumptions about the mechanism generating the data. The set of all assumptions made by the learning algorithm is known as the inductive bias.

This discussion points to a key difference between memorizing and learning. While the former amounts to mere retrieval of a value \( t_n \), 
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 49

Context: # 2.5. Minimum Description Length (MDL)

In a gray scale image, fix some probability mass function (pmf) \( p(x) \) on this alphabet. A key result in information theory states that it is possible to design a lossless compression scheme that uses \( [-\log p(x)] \) bits to represent value \( x \).

By virtue of this result, the choice of a probability distribution \( p(x) \) is akin to the selection of a lossless compression scheme that produces a description of around \( -\log p(x) \) bits to represent value \( x \). Note that the description length \( -\log p(x) \) decreases with the probability assigned by \( p(x) \) to value \( x \): more likely values under \( p(x) \) are assigned a smaller description. Importantly, a decoder would need to know \( p(x) \) in order to recover \( x \) from the bit description.

At an informal level, the MDL criterion prescribes the selection of a model that compresses the training data to the shortest possible description. In other words, the model selected by MDL defines a compression scheme that describes the data set \( D \) with the minimum number of bits. As such, the MDL principle can be thought of as a formalization of Occam’s razor: choose the model that yields the simplest explanation of the data. As we will see below, this criterion naturally leads to a solution that penalizes overfitting.

### What is the length of the description of a data set \( D \) that results from the selection of a specific value of \( M \)?

The answer is not straightforward, since, for a given value of \( M \), there are as many probability distributions as there are values for the corresponding parameters \( \theta \) to choose from. A formal calculation of the description length would hence require the introduction of the concept of universal compression for a given probabilistic model \( \theta \). Here, we will limit ourselves to a particular class of universal codes known as two-part codes.

Using two-part codes, we can compute the description length for the data \( D \) that results from the choice of a model over \( M \) as follows. First, we obtain the ML solution \( (m_{ML}, \hat{\theta}_{ML}) \). Then, we describe the data set by using a compression scheme defined by the probability \( p(x | m_{ML}, \hat{\theta}_{ML}) \). As discussed, this pro-
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 57

Context: # 3  
## Probabilistic Models for Learning  

In the previous chapter, we have introduced the frequentist, Bayesian, and MDL learning frameworks. As we have seen, parametric probabilistic models play a key role for all three of them. The linear regression example considered in the previous chapter was limited to a simple linear Gaussian model, which is insufficient to capture the range of learning problems that are encountered in practice. For instance, scenarios of interest may encompass discrete variables or non-negative quantities.  

In this chapter, we introduce a family of probabilistic models, known as the **exponential family**, whose members are used as components in many of the most common probabilistic models and learning algorithms. The treatment here will be leveraged in the rest of the monograph in order to provide the necessary mathematical background. Throughout this chapter, we will specifically emphasize the common properties of the models in the exponential family, which will prove useful for deriving learning algorithms in the following chapters.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 58

Context: # 3.1 Preliminaries

We start with a brief review of some definitions that will be used throughout the chapter and elsewhere in the monograph (see [28] for more details). Readers with a background in convex analysis and calculus may just review the concept of sufficient statistic in the last paragraph.

First, we define a **convex set** as a subset of \(\mathbb{R}^D\), for some \(D\), that contains all segments between any two points in the set. Geometrically, convex sets hence cannot have “indentations.” Function \(f(x)\) is convex if its domain is a convex set and if it satisfies the inequality \(f(\lambda x_1 + (1 - \lambda)x_2) \leq \lambda f(x_1) + (1 - \lambda)f(x_2)\) for all \(x_1\) and \(x_2\) in its domain and for all \(0 \leq \lambda \leq 1\). Geometrically, this condition says that the function is “U”-shaped: the curve defining the function cannot be above the segment obtained by connecting any two points on the curve. A function is strictly convex if the inequality above is strict except for \(\lambda = 0\) or \(\lambda = 1\); a concave, or strictly concave, function is defined by reversing the inequality above – it is hence “n-shaped.”

The minimization of a convex (“U”) function over a convex constraint set or the maximization of a concave (“n”) function over a convex constraint set are known as convex optimization problems. For these problems, there exist powerful analytical and algorithmic tools to obtain globally optimal solutions [28].

We also introduce two useful concepts from calculus. The **gradient** of a differentiable function \(f(x)\) with \(x = [x_1, \ldots, x_D]^T \in \mathbb{R}^D\) is defined as the \(D \times 1\) vector \(\nabla f(x) = [\frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_D}]^T\) containing all partial derivatives. At any point \(x\) in the domain of the function, the gradient is a vector that points to the direction of locally maximal increase of the function. The Hessian \(\nabla^2 f(x)\) is the \(D \times D\) matrix with \((i,j)\) element given by the second-order derivative \(\frac{\partial^2 f(x)}{\partial x_i \partial x_j}\). It captures the local curvature of the function.

1. A statistic is a function of the data.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 64

Context: ```markdown
setting \( C = 2 \) recovers the Bernoulli distribution. PMFs in this model are given as:

\[
\text{Cat}(x| \mu) = \prod_{k=0}^{C-1} \mu_k^{I(x=k)} \times (1 - \sum_{k=1}^{C-1} \mu_k) \quad (3.16)
\]

where we have defined \( \mu_k = \Pr[x = k] \) for \( k = 1, \ldots, C - 1 \) and \( \mu_0 = 1 - \sum_{k=1}^{C-1} \mu_k = \Pr[x = 0] \). The log-likelihood (LL) function is given as:

\[
\ln(\text{Cat}(x|\mu)) = \sum_{k=1}^{C-1} I(x=k) \ln \mu_k + \ln \mu_0 \quad (3.17)
\]

This demonstrates that the categorical model is in the exponential family, with sufficient statistics vector \( u(x) = [I(x = 0) \ldots I(x = C - 1)]^T \) and measure function \( m(x) = 1 \). Furthermore, the mean parameters \( \mu = [\mu_1, \ldots, \mu_{C-1}]^T \) are related to the natural parameter vector \( \eta = [\eta_1, \ldots, \eta_{C-1}]^T \) by the mapping:

\[
\eta_k = \ln \left( \frac{\mu_k}{1 - \sum_{j=1}^{C-1} \mu_j} \right) \quad (3.18)
\]

which again takes the form of an LLR. The inverse mapping is given by:

\[
\mu = \left( \frac{e^{\eta_k}}{1 + \sum_{j=1}^{C-1} e^{\eta_j}} \right) \quad (3.19)
\]

The parameterization given here is minimal, since the sufficient statistics \( u(x) \) are linearly independent. An overcomplete representation would instead include in the vector of sufficient statistics also the function \( I(x=0) \). In this case, the resulting vector of sufficient statistics is:

\[
u(x) = 
\begin{cases} 
1 & \text{if } x = 0 \\ 
1 & \text{if } x = C - 1 
\end{cases} \quad (3.20)
\]

which is known as one-hot encoding of the categorical variable, since only one entry equals 1 while the others are zero. Furthermore, with this...
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 85

Context: # 4.2. Classification as a Supervised Learning Problem

The binary classification problem is illustrated in Fig. 4.1. Given a training set \( D \) of labeled examples \( x_n, n = 1, \ldots, N \), the problem is to assign a new example \( x \) to either class \( C_0 \) or \( C_1 \). In this particular standard data set, the two variables in each vector \( x \) measure the sepal length and sepal width of an iris flower. The latter may belong to either the setosa or virginica family, as encoded by the label \( t_n \) and represented in the figure with different markers. Throughout, we denote as \( D \) the dimension of the domain point \( x \) (\( D = 2 \) in Fig. 4.1).

![Binary classification illustration](path/to/figure4.1.png)

**Figure 4.1:** Illustration of the binary (\( K = 2 \) classes) classification problem with a domain space of dimension \( D = 2 \): to which class should the new example \( x \) be assigned?

Following the taxonomy introduced in Chapter 2, we can distinguish the following modeling approaches, which will be reviewed in the given order throughout the rest of this chapter:

- **Deterministic deterministic models:** Model directly the deterministic mapping between domain point and label via a parameterized function \( t \approx \hat{t}(x) \).
  
- **Discriminative probabilistic models:** Model the probability of a point \( x \) belonging to class \( C_k \) via a parameterized conditional pmf \( p(t|x) \), with the relationship between \( t \) and \( C_k \) defined in (4.4). We will also
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 130

Context: # 5.4 VC Dimension and Fundamental Theorem of PAC Learning

We have seen that finite classes are PAC learnable with sample complexity proportional to the model capacity in \(|H|\) by using ERM. In this section, we address the following questions: Is \(N_{\text{SRM}}(N,\delta)\) the smallest sample complexity? How can we define the capacity of infinite hypothesis classes? We have discussed at the end of Section 5.3 that the answer to the latter question cannot be found by extrapolating from results obtained when considering finite hypothesis classes. In contrast, we will see here that the answers to both of these questions rely on the concept of VC dimension, which serves as a more fundamental definition of capacity of a model. The VC dimension is defined next.

## Definition 5.4

A hypothesis class \(\mathcal{H}\) is said to shatter a set of domain points \(X = \{x_1,\ldots,x_n\}\) if, no matter how the corresponding labels \(\{y_i \in \{0,1\}\}\) are selected, there exists a hypothesis \(h \in \mathcal{H}\) that ensures \(\hat{y}_{n} = y_n\) for all \(n = 1,\ldots,V\).

## Definition 5.5

The VC dimension \(\text{VCdim}(\mathcal{H})\) of the model \(\mathcal{H}\) is the size of the largest set \(X\) that is shattered by \(\mathcal{H}\).

Based on the definitions above, to prove that a model has VCdim(\(\mathcal{H}\)) = V, we need to carry out the following two steps:

1. **Step 1:** Demonstrate the existence of a set \(X\) with \(|X| = V\) that is shattered by \(\mathcal{H}\);
2. **Step 2:** Prove that no set \(X'\) of dimensions \(V + 1\) exists that is shattered by \(\mathcal{H}\).

The second step is typically seen to be more difficult, as illustrated by the following examples.

## Example 5.3

The threshold function model (5.8) has VCdim(\(\mathcal{H}) = 1\), since there is clearly a set \(X\) of one sample (\(V = 1\)) that can be shattered.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 173

Context: # 7.1 Introduction

In this section, we start by discussing two examples that illustrate the type of structural information that can be encoded by means of probabilistic graphical models. We then provide an overview of this chapter.

As illustrated by the two examples below, structured probabilistic models can be used to set up parametric models for both supervised and unsupervised learning. In the former case, all variables are observed in the training set, with some revs being inputs, i.e., covariates (\(x\)), and others being considered as outputs, or targets (\(y\)). In contrast, in the latter case, some variables are unobserved and play the role of latent variables (\(z\)) that help explain or generate the observed variables (\(y\)).[^1]

## Example 7.1

Consider the tasks of text classification via supervised learning or topic clustering via unsupervised learning. In the supervised learning case, the problem is to classify documents depending on their topic, e.g., sport, politics or entertainment, based on a set of labeled documents. With unsupervised learning, the problem is to cluster documents according to the similarity of their contents based on the sole observation of the documents themselves.

A minimal model for this problem should include a variable \(t\) representing the topic and a variable \(x\) for the document. The topic can be represented by a categorical variable taking \(T\) values, i.e., \(t \in \{1, ... , T\}\), which is observed for supervised learning and latent for unsupervised learning. As for the document, with “bag-of-words” encoding, a set of \(W\) words of interest is selected, and a document is encoded as \(W \times 1\) binary vector \(z = [z_1, ..., z_W]^T\), where \(z_w = 1\) if word \(w\) is contained in the document.[^2]

To start, we could try to use an unstructured directed model defined as

\[
t \sim \text{Cat}(π) \tag{7.1a}
\]

\[
x | t \sim \text{Cat}(t) \tag{7.1b}
\]

[^1]: Strictly speaking, this distinction applies to the frequentist approach, since in the Bayesian approach the model parameters are always treated as unobserved random variables.
[^2]: Note that \(W\) here does not represent a matrix of weights!
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 177

Context: # 7.2. Bayesian Networks

or the Markov chain notation

\[
x_{uv} \; t = \{x_{uv}\} \; f_{xy}.
\]

Mathematically, this condition means that we have \( p(x_u|t, x_{p}) = p(x_u|t) \) where \( x_{p} \) is any subset of the variables \( \{x_{uv}\} \; f_{xy} \). Applying the chain rule to the rvs \( \{x_{w} \}_{w=1}^{W} \) using order \( x_{1}, x_{2}, \ldots, x_{W} \) (or any other order on the \( \{x_{w}\}_{w=1}^{W} \) variables), we can write

\[
p(x, t) = p(t) \prod_{n=1}^{W} p(x_n|t).
\]

This factorization is represented by the BN in Fig. 7.1. In the directed graph shown in the figure, each vertex corresponds to a random variable (rv), and a directed edge is included from \( t \) to each variable \( x_u \). This edge captures the fact that the conditional probability of the variable \( x_u \) in (7.6) is conditioned on \( t \). Informally, it “causes” all variables in vector \( x \). The graph accounts for multiple i.i.d. realization \( (x_{tp})_{p=1}^{N} = \{x_{1}, \ldots, x_{N}\} \). The joint distribution factorizes as

\[
p(x_{p, t_p}) = \prod_{n=1}^{N} p(t) \prod_{u=1}^{W} p(x_u|t).
\]

The BN uses the title notation that will be formalized below to indicate multiple independent realizations of rvs with the same distribution.

Generalizing the example above, we define BNs as follows.

**Definition 7.1.** A BN is a directed acyclic graph (DAG)³, whose vertices represent rvs \( \{x_1, \ldots, x_K\} \) with an associated joint distribution that factorizes as

\[
p(x_{1}, \ldots, x_{K}) = \prod_{k=1}^{K} p(x_k|x_{pa(k)}).
\]

where \( P(k) \) denotes the set of parents of node \( k \) in the DAG. In a BN, rvs are represented by filled circles, while learned parameters defining the conditional distributions are represented by dots.

³In a DAG, there are no directed cycles, that is, no closed paths following the direction of the arrows.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 191

Context: When the undirected graph is not a tree, one can use the **junction tree algorithm** for exact Bayesian inference. The idea is to group subsets of variables together in cliques, in such a way that the resulting graph is a tree. The complexity depends on the treewidth of the resulting graph. When this complexity is too high for the given application, approximate inference methods are necessary. This is the subject of the next chapter.

## 7.5 Summary

Probabilistic graphical models encode a priori information about the structure of the data in the form of causality relationships — via directed graphs and Bayesian networks (BNs) — or mutual affinities — via undirected graphs and Markov Random Fields (MRFs). This structure translates into conditional independence conditions. The structural properties encoded by probabilistic graphical models have the potential advantage of controlling the capacity of a model, hence contributing to the reduction of overfitting at the expense of possible bias effects (see Chapter 5). They also facilitate Bayesian inference (Chapters 2-4), at least in graphs with tree-like structures. Probabilistic graphical models can be used as the underlying probabilistic framework for supervised, unsupervised, and semi-supervised learning problems, depending on which subsets of random variables (rvs) are observed or latent.

While graphical models can reduce the complexity of Bayesian inference, this generally remains computationally infeasible for most models of interest. To address this problem, the next chapter discusses approximate Bayesian inference, as well as associated learning problems (Chapter 6).
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 9

Context: # Learning and Intuition

We have all experienced the situation that the solution to a problem presents itself while riding your bike, walking home, “relaxing” in the washroom, waking up in the morning, taking your shower, etc. Importantly, it did not appear while banging your head against the problem in a conscious effort to solve it, staring at the equations on a piece of paper. In fact, I would claim that all my bits and pieces of progress have occurred while taking a break and “relaxing out of the problem.” Greek philosophers walked in circles when thinking about a problem; most of us stare at a computer screen all day. The purpose of this chapter is to make you more aware of where your creative mind is located and to interact with it in a fruitful manner.

My general thesis is that, contrary to popular belief, creative thinking is not performed by conscious thinking. It is rather an interplay between your conscious mind, which prepares the seeds to be planted into the unconscious part of your mind. The unconscious mind will munch on the problem “out of sight” and return promising roads to solutions to the consciousness. This process iterates until the conscious mind decides the problem is sufficiently solved, intractable, or plain dull and moves on to the next. It may be a little unsettling to learn that at least part of your thinking goes on in a part of your mind that seems inaccessible and has a very limited interface with what you think of as yourself. But it is undeniable that it is there, and it is also undeniable that it plays a role in the creative thought process.

To become a creative thinker, one should learn to play this game more effectively. To do so, we should think about the language in which to represent knowledge that is most effective in terms of communication with the unconscious. In other words, what type of “interface” between conscious and unconscious mind should we use? It is probably not a good idea to memorize all the details of a complicated equation or problem. Instead, we should extract the abstract idea and capture the essence of it in a picture. This could be a movie with colors and other elements.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 11

Context: Many people may find this somewhat experimental way to introduce students to new topics counter-productive. Undoubtedly for many it will be. If you feel under-challenged and become bored, I recommend moving on to the more advanced textbooks, of which there are many excellent samples on the market (for a list see [books](#)). But I hope that for most beginning students, this intuitive style of writing may help to gain a deeper understanding of the ideas that I will present in the following. Above all, have fun!
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 15

Context: # 1.1. DATA REPRESENTATION

Most datasets can be represented as a matrix, \( X = [X_{n,k}] \), with rows indexed by "attribute-index" \( i \) and columns indexed by "data-index" \( n \). The value \( X_{n,k} \) for attribute \( i \) and data-case \( n \) can be binary, real, discrete, etc., depending on what we measure. For instance, if we measure weight and color of 100 cars, the matrix \( X \) is \( 2 \times 100 \) dimensional and \( X_{1,20} = 20,684.57 \) is the weight of car nr. 20 in some units (a real value) while \( X_{2,20} = 2 \) is the color of car nr. 20 (say one of 6 predefined colors).

Most datasets can be cast in this form (but not all). For documents, we can give each distinct word of a prespecified vocabulary a number and simply count how often a word was present. Say the word "book" is defined to have nr. 10, 568 in the vocabulary then \( X_{1,1005} = 4 \) would mean: the word book appeared 4 times in document 5076. Sometimes the different data-cases do not have the same number of attributes. Consider searching the internet for images about cats. You’ll retrieve a large variety of images most with a different number of pixels. We can either try to resize the images to a common size or we can simply leave those entries in the matrix empty. It may also occur that a certain entry is supposed to be there but it couldn’t be measured. For instance, if we run an optical character recognition system on a scanned document some letters will not be recognized. We’ll use a question mark “?” to indicate that that entry wasn’t observed.

It is very important to realize that there are many ways to represent data and not all are equally suitable for analysis. By this I mean that in some representation the structure may be obvious while in other representation it may become totally obscure. It is still there, but just harder to find. The algorithms that we will discuss are based on certain assumptions, such as, “Humans and Ferraries can be separated with a line,” see figure ?. While this may be true if we measure weight in kilograms and height in meters, it is no longer true if we decide to re-code these numbers into bit-strings. The structure is still in the data, but we would need a much more complex assumption to discover it. A lesson to be learned is thus to spend some time thinking about in which representation the structure is as obvious as possible and transform the data if necessary before applying standard algorithms. In the next section we’ll discuss some standard preprocessing operations. It is often advisable to visualize the data before preprocessing and analyzing it. This will often tell you if the structure is a good match for the algorithm you had in mind for further analysis. Chapter ? will discuss some elementary visualization techniques.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 16

Context: # CHAPTER 1. DATA AND INFORMATION

## 1.2 Preprocessing the Data

As mentioned in the previous section, algorithms are based on assumptions and can become more effective if we transform the data first. Consider the following example, depicted in Figure 1. The algorithm we consist of estimating the area that the data occupy. It grows a circle starting at the origin and at the point it contains all the data we record the area of a circle. In the figure why this will be a bad estimate: the data-cloud is not centered. If we had first centered it we would have obtained reasonable estimate. Although this example is somewhat simple-minded, there are many, much more interesting algorithms that assume centered data. To center data we will introduce the sample mean of the data, given by,

\[ 
E[X_i] = \frac{1}{N} \sum_{n=1}^{N} X_{in} \tag{1.1} 
\]

Hence, for every attribute \(i\) separately, we simple add all the attribute value across data-cases and divide by the total number of data-cases. To transform the data so that their sample mean is zero, we set,

\[ 
X'_{in} = X_{in} - E[X_i], \quad \forall n 
\tag{1.2} 
\]

It is now easy to check that the sample mean of \(X'\) indeed vanishes. An illustration of the global shift is given in Figure 2. We also see in this figure that the algorithm described above now works much better!

In a similar spirit as centering, we may also wish to scale the data along the coordinate axis in order to make it more “spherical.” Consider Figure 3. In this case the data was first centered, but the elongated shape still prevented us from using the simplistic algorithm to estimate the area covered by the data. The solution is to scale the axes so that the spread is the same in every dimension. To define this operation we first introduce the notion of sample variance,

\[ 
V[X_i] = \frac{1}{N} \sum_{n=1}^{N} X_{in}^2 \tag{1.3} 
\]

where we have assumed that the data was first centered. Note that this is similar to the sample mean, but now we have used the square. It is important that we have removed the sign of the data-cases (by taking the square) because otherwise positive and negative signs might cancel each other out. By first taking the square, all data-cases first get mapped to positive half of the axes (for each dimension or
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 24

Context: # CHAPTER 3. LEARNING

They understood that this was a lion. They understood that all lions have these particular characteristics in common, but may differ in some other ones (like the presence of a scar or scar someplace).

Bob has another disease which is called over-generalization. Once he has seen an object he believes almost everything is some, perhaps twisted instance of the same object class (In fact, I seem to suffer from this so now and then when I think all of machine learning can be explained by this one new exciting principle). If ancestral Bob walks the savanna and he has just encountered an instance of a lion and fled into a tree with his buddies, the next time he sees a squirrel he believes it is a small instance of a dangerous lion and flees into the trees again. Over-generalization seems to be rather common among small children.

One of the main conclusions from this discussion is that we should neither over-generalize nor over-fit. We need to be on the edge of being just right. But just right about what? It doesn’t seem there is one correct God-given definition of the category chairs. We seem to all agree, but one can surely find examples that would be difficult to classify. When do we generalize exactly right? The magic word is **PREDICTION**. From an evolutionary standpoint, all we have to do is make correct predictions about aspects of life that help us survive. Nobody really cares about the definition of lion, but we do care about our responses to the various animals (run away for lion, chase for deer). And there are a lot of things that can be predicted in the world. This food kills me but food is good for me. Drumming my fists on my hairy chest in front of a female generates opportunities for sex; sticking my hand into that yellow-orange flickering “flame” hurts my hand and so on. The world is wonderfully predictable and we are very good at predicting it.

So why do we care about object categories in the first place? Well, apparently they help us organize the world and make accurate predictions. The category *lions* is an abstraction and abstractions help us to generalize. In a certain sense, learning is all about finding useful abstractions or concepts that describe the world. Take the concept “fluid”; it describes all watery substances and summarizes some of their physical properties. Or the concept of “weight”: an abstraction that describes a certain property of objects.

Here is one very important corollary for you: 

> “machine learning is not in the business of remembering and regurgitating observed information, it is in the business of transferring (generalizing) properties from observed data onto new, yet unobserved data.” 

This is the mantra of machine learning that you should repeat to yourself every night before you go to bed (at least until the final exam).

The information we receive from the world has two components to it: there
##########

"""QUERY: an you summarize in 2 sentences what is discrete math?"""

Consider the chat history for relevant information. Use all information included. Use as much tokens as needed. Important: If you find information separated by a | in the context, it is a table formatted in Markdown.Important: Take a look at the QUERY and only the QUERY. If this is vague or unclear, please ignore everything and ask a follow-up question instead!


Final Files Sources: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 1, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 3, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 5, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 6, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 9, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 10, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 11, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 12, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 13, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 15, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 17, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 18, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 21, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 22, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 23, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 24, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 25, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 27, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 28, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 30, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 31, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 33, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 36, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 38, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 39, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 40, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 41, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 43, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 49, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 52, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 57, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 60, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 64, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 67, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 68, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 71, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 74, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 81, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 89, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 93, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 95, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 100, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 101, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 109, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 119, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 123, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 135, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 144, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 145, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 149, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 150, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 151, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 152, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 156, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 161, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 165, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 167, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 170, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 175, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 188, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 197, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 201, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 205, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 207, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 216, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 217, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 219, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 222, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 224, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 226, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 230, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 231, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 233, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 235, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 238, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 240, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 244, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 251, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 252, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 253, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 254, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 257, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 8, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 13, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 17, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 22, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 49, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 57, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 58, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 64, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 85, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 130, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 173, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 177, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 191, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 9, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 11, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 15, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 16, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 24
==================================================
**Elapsed Time: 3.89 seconds**
==================================================


FINAL ANSWER 
Answer: Discrete mathematics is a branch of mathematics that focuses on the study of distinct and separate objects, such as integers, graphs, and logical statements, rather than continuous quantities. It encompasses various concepts including sets, functions, relations, and combinatorial structures, which are essential for applications in computer science, information technology, and algorithm design.
==================================================
**Elapsed Time: 1.13 seconds**
==================================================