{
    "query": "Summarize in 5 sentences the contents of your context. Speak in Old English manner similar to how Shakespeare writes his sonnets.",
    "namespace": "e791d701-717d-4c06-9602-2878acdbb6ad",
    "messages": [
        {
            "role": "user",
            "content": "Summarize in 5 sentences the contents of your context. Speak in Old English manner similar to how Shakespeare writes his sonnets."
        },
        {
            "role": "assistant",
            "content": " "
        }
    ],
    "stream": true,
    "language_level": "",
    "chat_channel": ":133:1055",
    "language": "German",
    "tone": "neutral",
    "writing_style": "standard",
    "model": "gpt-4o-mini-2024-07-18",
    "knowledgebase": "ki-dev-large",
    "seed": null,
    "client_id": 1055,
    "all_context": false,
    "follow_up_for": null,
    "knowledgebase_files_count": 3,
    "override_command": "",
    "disable_clarity_check": false,
    "custom_primer": "",
    "logging": true,
    "query_route": ""
}


INITIALIZATION
Knowledgebase: ki-dev-large
Base Query: Summarize in 5 sentences the contents of your context. Speak in Old English manner similar to how Shakespeare writes his sonnets.
Model: gpt-4o-mini-2024-07-18
Use Curl?: 
==================================================
**Elapsed Time: 0.00 seconds**
==================================================


ROUTING
Query type: summary
==================================================
**Elapsed Time: 1.88 seconds**
==================================================


RAG PARAMETERS
Max Context To Include: 100
Lowest Score to Consider: 0.1
==================================================
**Elapsed Time: 0.00 seconds**
==================================================


VECTOR SEARCH ALGORITHM TO USE 
Use MMR search?: True
Use Similarity search?: False
==================================================
**Elapsed Time: 0.10 seconds**
==================================================


VECTOR SEARCH DONE 
==================================================
**Elapsed Time: 6.19 seconds**
==================================================


PRIMER 
Primer: 
            You are Simon, a highly intelligent personal assistant in a system called KIOS. You are a chatbot that
            can read knowledgebases through the "CONTEXT" that is included in the user's chat message.

        
            Your role is to act as an expert at summarization and analysis.

            In your responses to enterprise users, prioritize clarity, trustworthiness, and appropriate formality.

            Be honest by admitting when a topic falls outside your scope of knowledge, and suggest
            alternative avenues for obtaining information when necessary.

            Make effective use of chat history to avoid redundancy and enhance response relevance, continuously
            adapting to integrate all necessary details in your interactions.

            Use as much tokens as possible to provide a detailed response.
        
==================================================
**Elapsed Time: 0.19 seconds**
==================================================


FINAL QUERY 
Final Query: CONTEXT: ##########
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 1

Context: # A Cool Brisk Walk  
Through Discrete Mathematics

**Version 2.2.1**  
**Stephen Davies, Ph.D.**  

## Table of Contents
1. Introduction
2. Concepts
   - Sets
   - Functions
   - Relations
3. Applications
4. Conclusion

## Introduction
In this document, we explore various topics under discrete mathematics. 

## Concepts

### Sets
- Definition
- Notation
- Operations

### Functions
- Definition
- Types of functions
- Examples

### Relations
- Definition
- Properties

| Concept   | Description                       |
|-----------|-----------------------------------|
| Set      | A collection of distinct objects   |
| Function  | A relation from a set to itself    |
| Relation  | A connection between two sets       |

## Applications
The concepts of discrete mathematics can be applied in various fields, including computer science, information technology, and combinatorial design.

## Conclusion
Discrete mathematics provides fundamental knowledge required for numerous applications in modern science and technology.

Image Analysis: 

### Analysis of the Attached Visual Content

#### 1. Localization and Attribution:
- **Image Identification and Numbering:**
    - This is a single image and is referred to as **Image 1**.

#### 2. Object Detection and Classification:
- **Image 1:**
    - The image depicts a forest scene with several birch trees.
    - **Key Features of Detected Objects:**
        - **Trees:** The trees are characterized by their tall, slender trunks with distinctive white bark and dark horizontal markings typical of birches.
        - **Forest Ground:** There are grassy elements and what appears to be a forest floor with various green and brown patches.

#### 3. Scene and Activity Analysis:
- **Image 1:**
    - **Scene Description:**
        - The image is an artistic painting capturing a serene forest with several birch trees.
        - There is no human activity depicted; the scene is calm, evoking a sense of peace and natural beauty.

#### 4. Text Analysis:
- **Image 1:**
    - **Detected Text:**
        - Top: None
        - Bottom:
            - "A Cool Brisk Walk" (Title)
            - "Through Discrete Mathematics" (Subtitle)
            - "Version 2.2.1" (Version Information)
            - "Stephen Davies, Ph.D." (Author Name)
    - **Text Significance:**
        - The text suggests that this image is probably the cover of a book titled "A Cool Brisk Walk Through Discrete Mathematics" by Stephen Davies, Ph.D. The version number indicates it is version 2.2.1, signifying a possibly updated or revised edition.

#### 8. Color Analysis:
- **Image 1:**
    - **Color Composition:**
        - The dominant colors are various shades of green (foliage), white, and light brown (birch trunks), and a soft blue (background).
    - **Color Impact:**
        - The colors create a cool, calming effect, reflecting the serene mood of a walk through a quiet forest.

#### 9. Perspective and Composition:
- **Image 1:**
    - **Perspective:**
        - The image appears to be from a standing eye-level perspective, giving a natural view as one would see while walking through the forest.
    - **Composition:**
        - The birch trees are placed prominently in the foreground, with the background filled with dense foliage. The positioning of the text at the bottom complements the overall composition without distracting from the artwork.

#### 10. Contextual Significance:
- **Image 1:**
    - **Overall Document Context:**
        - As a book cover, the image serves the purpose of drawing the reader's attention. The serene and inviting forest scene combined with the title implies a gentle, inviting approach to the subject of discrete mathematics.

The image successfully combines art with educational content, aiming to make the subject of discrete mathematics more appealing and accessible to the reader. The natural, calming scenery juxtaposed with an academic theme may suggest that the book intends to provide a refreshing perspective on a typically challenging subject.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 2

Context: Copyright © 2023 Stephen Davies.

## University of Mary Washington
**Department of Computer Science**  
James Farmer Hall  
1301 College Avenue  
Fredericksburg, VA 22401

---

Permission is granted to copy, distribute, transmit and adapt this work under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/).

The accompanying materials at [www.allthemath.org](http://www.allthemath.org) are also under this license.

If you are interested in distributing a commercial version of this work, please contact the author at [stephen@umw.edu](mailto:stephen@umw.edu).

The LaTeX source for this book is available from: [https://github.com/divilian/cool-brisk-walk](https://github.com/divilian/cool-brisk-walk).

---

Cover art copyright © 2014 Elizabeth M. Davies.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 3

Context: # Contents at a glance

## Contents at a glance
- **Contents at a glance** …… i
- **Preface** …… iii
- **Acknowledgements** …… v
  
1. **Meetup at the trailhead** …… 1
2. **Sets** …… 7
3. **Relations** …… 35
4. **Probability** …… 59
5. **Structures** …… 85
6. **Counting** …… 141
7. **Numbers** …… 165
8. **Logic** …… 197
9. **Proof** …… 223

Also be sure to check out the forever-free-and-open-source instructional videos that accompany this series at [www.allthemath.org](http://www.allthemath.org)!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 5

Context: # Preface

Discrete math is a popular book topic — start Googling around and you’ll find a zillion different textbooks about it. Take a closer look, and you’ll discover that most of these are pretty thick, dense volumes packed with lots of equations and proofs. They’re principled approaches, written by mathematicians and (seemingly) to mathematicians. I speak with complete frankness when I say I’m comforted to know that the human race is well covered in this area. We need smart people who can derive complex expressions and prove theorems from scratch, and I’m glad we have them.

Your average computer science practitioner, however, might be better served by a different approach. There are elements to the discrete math mindset that a budding software developer needs experience with. This is why discrete math is (properly, I believe) part of the mandatory curriculum for most computer science undergraduate programs. But for future programmers and engineers, the emphasis should be different than it is for mathematicians and researchers in computing theory. A practical computer scientist mostly needs to be able to use these tools, not to derive them. She needs familiarity, and practice, with the fundamental concepts and the thought processes they involve. The number of times the average software developer will need to construct a proof in graph theory is probably near zero. But the times she'll find it useful to reason about probability, logic, or the principles of collections are frequent.

I believe the majority of computer science students benefit most from simply gaining an appreciation for the richness and rigor of
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 8

Context: I'm unable to view images. Please provide the text in Markdown format for me to assist you.

Image Analysis: 

### Image Analysis:

#### 1. **Localization and Attribution:**
- **Image Number**: Image 1

#### 2. **Object Detection and Classification:**
- **Detected Objects**: The image includes a cup of coffee on a saucer, three coffee beans, and some sparkles implying aroma or steam.
- **Object Classification**:
  - **Cup of Coffee**: A central object, likely made of ceramic and filled with coffee.
  - **Coffee Beans**: Close to the cup, classifiable as roasted beans.
  - **Sparkles/Steam**: Above the cup, representing the aroma of the hot coffee.

#### 3. **Scene and Activity Analysis:**
- **Scene Description**: This image displays a cup of coffee on a saucer with coffee beans around it. Decorative sparkles or steam lines are depicted rising above the coffee, suggesting freshness or warmth.
- **Activities**: No human activities observed; the focus is on the presentation of the coffee.

#### 4. **Text Analysis:**
- No text detected in this image.

#### 5. **Diagram and Chart Analysis:**
- Not applicable; no diagrams or charts are present in the image.

#### 6. **Product Analysis:**
- **Product**: A cup of coffee.
- **Main Features**: 
  - A white/neutral-colored ceramic cup and saucer filled with coffee.
  - Roasted coffee beans placed nearby, suggesting the quality or type of coffee used.
- **Materials**: Ceramic cup and saucer, roasted coffee beans.
- **Colors**: Predominantly brown (coffee and beans) and white (cup and saucer).

#### 7. **Anomaly Detection:**
- No anomalies detected; the elements are consistent with a typical coffee presentation.

#### 8. **Color Analysis**:
- **Dominant Colors**: Brown and white.
  - **Brown**: Evokes warmth and richness, associated with coffee and coffee beans.
  - **White**: Clean and neutral, ensures focus on the coffee.

#### 9. **Perspective and Composition**:
- **Perspective**: Bird’s eye view.
- **Composition**: 
  - Centralized with the cup of coffee as the main focus.
  - Balanced by the placement of coffee beans around the cup.
  - Decorative elements like sparkles/steam give a visual cue towards the coffee's freshness or aroma.

#### 10. **Contextual Significance**:
- The image can be used in contexts such as coffee advertisements, cafe menus, or articles about coffee. It emphasizes the quality and appeal of the coffee through its visual presentation.

#### 11. **Metadata Analysis**:
- No metadata available for analysis.

#### 12. **Graph and Trend Analysis**:
- Not applicable; no graphs or trends are present in the image.

#### 13. **Graph Numbers**:
- Not applicable; no data points or graph numbers are present in the image.

### Additional Aspects:

#### **Ablaufprozesse (Process Flows)**:
- Not applicable; no process flows depicted.

#### **Prozessbeschreibungen (Process Descriptions)**:
- Not applicable; no processes described.

#### **Typen Bezeichnung (Type Designations)**:
- Not applicable; no type designations specified.

#### **Trend and Interpretation**:
- Not applicable; no trends depicted.

#### **Tables**:
- Not applicable; no tables included.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 11

Context: # Understanding Integration and Differentiation

People in your family, there will never be 5.3 of them (although there could be 6 someday).

The last couple of entries on this list are worth a brief comment. They are math symbols, some of which you may be familiar with.

On the right side — in the continuous realm — are \( J \) and \( \frac{d}{dx} \), which you’ll remember if you’ve taken calculus. They stand for the two fundamental operations of integration and differentiation. Integration, which can be thought of as finding “the area under a curve,” is basically a way of adding up a whole infinite bunch of numbers over some range. When you “integrate the function \( x^2 \) from 3 to 5,” you’re really adding up all the tiny, tiny little vertical strips that comprise the area from \( x = 3 \) on the left to \( x = 5 \) on the right. Its corresponding entry in the left-column of the table is \( \Sigma \), which is just a short-hand for “sum up a bunch of things.” Integration and summation are equivalent operations; it’s just that when you integrate, you’re adding up all the (infinitely many) slivers across the real-line continuum. When you sum, you’re adding up a fixed sequence of entries, one at a time, like in a loop. \( \Sigma \) is just the discrete “version” of \( J \).

The same sort of relationship holds between ordinary subtraction (\(-\)) and differentiation (\(\frac{d}{dx}\)). If you’ve plotted a bunch of discrete points on \( x \)-\( y \) axes, and you want to find the slope between two of them, you just subtract their \( y \) values and divide by the \( x \) distance between them. If you have a smooth continuous function, on the other hand, you use differentiation to find the slope at a point: this is essentially subtracting the tiny tiny difference between supremely close points and then dividing by the distance between them.

Don’t worry; you don’t need to have fully understood any of the integration or differentiation stuff I just talked about, or even to have taken calculus yet. I’m just trying to give you some feel for what “discrete” means, and how the dichotomy between discrete and continuous really runs through all of math and computer science. In this book, we will mostly be focusing on discrete values and structures, which turn out to be of more use in computer science. That’s partially because, as you probably know, computers
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 13

Context: # 1.1 EXERCISES

1. **What’s the opposite of concrete?**  
   Abstract.

2. **What’s the opposite of discrete?**  
   Continuous.

3. **Consider a quantity of water in a glass. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Concrete, since it’s a real entity you can experience with the senses. Continuous, since it could be any number of ounces (or liters, or tablespoons, or whatever). The amount of water certainly doesn’t have to be an integer. (Food for thought: since all matter is ultimately comprised of atoms, are even substances like water discrete?)

4. **Consider the number 27. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Abstract, since you can’t see or touch or smell “twenty-seven.” Probably discrete, since it’s an integer, and when we think of whole numbers we think “discrete.” (Food for thought: in real life, how would you know whether I meant the integer “27” or the decimal number “27.00”? And does it matter?) Clearly it’s discrete. Abstract vs. concrete, though, is a little tricky. If we’re talking about the actual transistor and capacitor that’s physically present in the hardware, holding a thing in charge in some little chip, then it’s concrete. But if we’re talking about the value “1” that is conceptually part of the computer’s currently executing state, then it’s really abstract just like 27 was. In this book, we’ll always be talking about bits in this second, abstract sense.

5. **Consider a bit in a computer’s memory. Would you call it abstract, or concrete? Discrete, or continuous?**  
   Any kind of abstract object that has properties we can reason about. 

6. **If math isn’t just about numbers, what else is it about?**  
   Any kind of abstract object that has properties we can reason about.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 14

Context: I'm unable to assist with that.

Image Analysis: 

### Analysis of Attached Visual Content

#### Image Identification and Localization
- **Image 1**: Single image provided in the content.

#### Object Detection and Classification
- **Image 1**:
  - **Objects Detected**:
    - Female figurine resembling a prehistoric representation.
    - The figurine appears to be crafted from a terracotta or similar clay-like material.
  - **Classification**: 
    - The object is classified under 'Artifacts' and 'Historical Objects'.
  - **Key Features**: 
    - The sculpture has exaggerated female attributes including a prominent chest and belly, which are indicative of fertility symbols.
    - The face does not have detailed features, implying focus on bodily form rather than facial details.
    - The object seems ancient, slightly weathered, suggesting it to be an archaeological artifact.

#### Scene and Activity Analysis
- **Image 1**:
  - **Scene Description**:
    - The image shows a close-up of a single artifact against a neutral background, possibly for the purpose of highlighting the artifact itself.
  - **Activities**: 
    - No dynamic activity; the object is displayed presumably for appreciation or study.

#### Perspective and Composition
- **Image 1**:
  - **Perspective**:
    - The image is taken from a straight-on, eye-level view, ensuring the object is the primary focus.
    - Close-up perspective to capture detailed features.
  - **Composition**:
    - The object is centrally placed, drawing immediate attention.
    - The background is plain and undistracting, enhancing the focus on the artifact itself.

#### Contextual Significance
- **Image 1**:
  - **Overall Contribution**:
    - The artifact could be used in educational, historical, or museum contexts to study prehistoric cultures, their art, and societal values.
    - As a fertility symbol, it contributes to understanding sociocultural aspects of ancient civilizations.

#### Color Analysis
- **Image 1**:
  - **Dominant Colors**:
    - Shades of brown and beige are dominant, corresponding to the natural materials like terracotta.
  - **Impact on Perception**:
    - The earthy tones evoke a sense of antiquity and authenticity, reinforcing the perception of it being an ancient artifact.

### Conclusion
The provided image is of a terracotta or similar material female figurine, likely of prehistoric origin. It is depicted with a neutral background to focus on its significant features, in particular, the enhanced bodily features indicative of fertility representations. The composition and color tones effectively highlight its historical and cultural importance.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 15

Context: # Chapter 2

## Sets

The place from which we'll start our walk is a body of mathematics called "set theory." Set theory has an amazing property: it's so simple and applicable that almost all the rest of mathematics can be based on it! This is all the more remarkable because set theory itself came along pretty late in the game (as things go) — it was singlehandedly invented by one brilliant man, Georg Cantor, in the 1870s. That may seem like a long time ago, but consider that by the time Cantor was born, mankind had already accumulated an immense wealth of mathematical knowledge: everything from geometry to algebra to calculus to prime numbers. Set theory was so elegant and universal, though, that after it was invented, nearly everything in math was redefined from the ground up to be couched in the language of sets. It turns out that this simple tool is an amazingly powerful way to reason about mathematical concepts of all flavors. Everything else in this book stands on set theory as a foundation.

Cantor, by the way, went insane as he tried to extend set theory to fully encompass the concept of infinity. Don't let that happen to you!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 17

Context: # 2.1. THE IDEA OF A SET

is the same whether we write it like this:

\( P = \{ \text{Dad}, \text{Mom} \} \)

or this:

\( P = \{ \text{Mom}, \text{Dad} \} \)

Those are just two different ways of writing the same thing.

The set \( E \) that had nobody in it can be written like this, of course:

\( E = \{ \} \)

but we sometimes use this special symbol instead:

\( E = \emptyset \)

However you write it, this kind of set (one that has no elements) is referred to as an **empty set**.

The set \( H \), above, contained all the members of the group under consideration. Sometimes we'll refer to "the group under consideration" as the **domain of discourse**. It too is a set, and we usually use the symbol \( \Omega \) to refer to it. So in this case,

\( \Omega = \{ \text{Mom}, \text{Johnny}, \text{T.J.}, \text{Dad}, \text{Lizzy} \} \)

Another symbol we'll use a lot is \( \in \), which means "is a member of." Since Lizzy is a female, we can write:

\( \text{Lizzy} \in F \)

to show that Lizzy is a member of the set \( F \). Conversely, we write:

\( \text{T.J.} \notin F \)

to show that T.J. is not.

As an aside, I mentioned that every item is either in, or not in, a set: there are no shades of gray. Interestingly, researchers have developed another body of mathematics called [I kid you not] *fuzzy*[^1] set theory.

[^1]: Some authors use the symbol \( U \) for this, and call it the **universal set**.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 19

Context: # 2.2. DEFINING SETS

Note that two sets with different intensions might nevertheless have the same extension. Suppose \( O \) is "the set of all people over 25 years old" and \( R \) is "the set of all people who wear wedding rings." If our \( \Omega \) is the Davies family, then \( O \) and \( R \) have the same extension (namely, Mom and Dad). They have different intensions, though; conceptually speaking, they're describing different things. One could imagine a world in which older people don't all wear wedding rings, or one in which some younger people do. Within the domain of discourse of the Davies family, however, the extensions happen to coincide.

**Fact:** We say two sets are equal if they have the same extension. This might seem unfair to intentionality, but that’s the way it is. So it is totally legit to write:

\[ O = R \]

since by the definition of set equality, they are in fact equal. I thought this was weird at first, but it’s really no weirder than saying "the number of years the Civil War lasted = Brett Favre’s jersey number when he played for the Packers." The things on the left and right side of that equals sign refer conceptually to two very different things, but that doesn’t stop them from both having the value 4, and thus being equal.

By the way, we sometimes use the curly brace notation in combination with a colon to define a set intentionally. Consider this:

\[ M = \{ k : k \text{ is between 1 and 20, and a multiple of } 3 \} \]

When you reach a colon, pronounce it as “such that.” So this says "M is the set of all numbers \( k \) such that \( k \) is between 1 and 20, and a multiple of 3." (There's nothing special about \( k \), here; I could have picked any letter.) This is an intensional definition since we haven’t listed the specific numbers in the set, but rather given a rule for finding them. Another way to specify this set would be to write:

\[ M = \{ 3, 6, 9, 12, 15, 18 \} \]

which is an extensional definition of the same set.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 23

Context: # 2.6 Sets of Sets

Sets are heterogeneous — a single set can contain four universities, seven integers, and an image — and so it might occur to you that they can contain other sets as well. This is indeed true, but let me issue a stern warning: you can get into deep water very quickly when you start thinking about “sets of sets.” In 1901, in fact, the philosopher Bertrand Russell pointed out that this idea can lead to unsolvable contradictions unless you put some constraints on it. What became known as “Russell’s Paradox” famously goes as follows: consider the set \( R \) of all sets that do not have themselves as a member.

For this reason, we’ll be very careful to use curly braces to denote sets, and parentheses to denote ordered pairs.

By the way, although the word “coordinate” is often used to describe the elements of an ordered pair, that’s really a geometry-centric word that implies a visual plot of some kind. Normally we won’t be plotting elements like that, but we will still have to deal with ordered pairs. I’ll just call the constituent parts “elements” to make it more general.

Three-dimensional points need ordered triples \( (x, y, z) \), and it doesn’t take a rocket scientist to deduce that we could extend this to any number of elements. The question is what to call them, and you do sort of sound like a rocket scientist (or other generic nerd) when you say tuple. Some people rhyme this word with “Drupal,” and others with “couple,” by the way, and there seems to be no consensus. If you have an ordered pair-type thing with 5 elements, therefore, it’s a 5-tuple (or a quintuple). If it has 117 elements, it’s a 117-tuple, and there’s really nothing else to call it. The general term (if we don’t know or want to specify how many elements) is n-tuple. In any case, it’s an ordered sequence of elements that may contain duplicates, so it’s very different than a set.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 26

Context: # 2.9 Combining Sets

Okay, so we have sets. Now what can we do with them? When you first learn about numbers back before kindergarten, the next thing you learn is how to combine numbers using various operations to produce other numbers. These include \( -, \times, +, \div \), exponents, roots, etc. Sets, too, have operations that are useful for combining to make other sets. These include:

- **Union (\(\cup\))**: The union of two sets is a set that includes the elements that either (or both) of them have as members. For instance, if \( A = \{ \text{Dad}, \text{Lizzy} \} \) and \( B = \{ \text{Lizzy}, \text{T.J.}, \text{Johnny} \} \), then \( A \cup B = \{ \text{Dad}, \text{Lizzy}, \text{T.J.}, \text{Johnny} \} \). Note that any element is in the union if it is in \( A \) or \( B \). For this reason, there is a strong relationship between the union operator of sets and the “or” (\( \vee \)) operator of boolean logic that we’ll see later.

- **Intersection (\(\cap\))**: The intersection of two sets is a set that includes the elements that both of them have as members. In the above example, \( A \cap B = \{ \text{Lizzy} \} \). There is a strong connection between intersection and the “and” (\(\land\)) boolean logic operator.

- **(Partial) Complement (\(-\))**: Looks like subtraction, but significantly different. \( A - B \) contains the elements from \( A \) that are also not in \( B \). So you start with \( A \) and then “subtract off” the contents of \( B \) if they occur. In the above example, \( A - B = \{ \text{Dad} \} \). (Note that T.J. and Johnny didn’t really enter into the calculation.) Unlike \( \cup \) and \( \cap \), it is not commutative. This means it’s not symmetric: \( A - B \) doesn’t (normally) give the same answer as \( B - A \). In this example, \( B - A = \{ \text{T.J.}, \text{Johnny} \} \), whereas, if you ever reverse the operands with union or intersection, you’ll always get the same result as before.

- **(Total) Complement (\(X\))**: Same as the partial complement, above, except that the implied first operand is \( \Omega \). In other words, \( A - B \) is “all the things in \( A \) that aren’t in \( B \),” whereas...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 31

Context: 2.10 Subsets
===========

We learned that the “∈” symbol is used to indicate set membership: the element on the left is a member of the set on the right. A related but distinct notion is the idea of a subset. When we say \( X \subset Y \) (pronounced “X is a subset of Y”), it means that every member of \( X \) is also a member of \( Y \). The reverse is not necessarily true, of course; otherwise “⊆” would just mean “=”. So if \( A = \{ Dad, Lizzy \} \) and \( K = \{ Dad, Mom, Lizzy \} \), then we can say \( A \subset K \).

Be careful about the distinction between “∈” and “⊆”, which are often confused. With “∈”, the thing on the left is an element, whereas with “⊆”, the thing on the left is a set. This is further complicated by the fact that the element on the left-hand side of “∈” might well be a set.

Let's see some examples. Suppose \( Q \) is the set \( \{ 4, 9, 4 \} \).  \( Q \) has three elements here, one of which is itself a set. Now suppose that we let \( P \) be the set \( \{ 4, 9 \} \). Question: Is \( P \subset Q \)? The answer is yes: the set \( \{ 4, 9 \} \) (which is the same as the set \( \{ 9, 4 \} \), just written in a different way) is in fact an element of the set \( Q \). Next question: Is \( P \subset Q \)? The answer is no, \( P \nsubseteq Q \). If there are two of them (4 and 4) also in \( Q \), then 4 is a member of \( Q \), not 9. Last question: If \( R \) is defined to be \( \{ 2, 4 \} \), is \( R \subset Q \)? The answer is yes, since both 2 and 4 are members of \( Q \).

Notice that the definition here is a subset of itself. Sometimes, though, it’s useful to talk about whether a set is really a subset of another, and you don’t want to “count” if the two sets are actually equal. This is called a proper subset, and the symbol for it is \( \subsetneq \). You can see the rationale for the choice of symbol, because “⊆” is kind of like “<” for numbers, and “⊂” is like “<”.

Every set is a subset (not necessarily a proper one) of itself, because
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 33

Context: ```markdown
## 2.11. POWER SETS

Don't lose sight. There are four elements to the power set of \( A \), each of which is one of the possible subsets. It might seem strange to talk about "all of the possible subsets"—when I first learned this stuff, I remember thinking at first that there would be no limit to the number of subsets you could make from a set. But of course there is. To create a subset, you can either include, or exclude, each one of the original set's members. In \( A \)’s case, you can either (1) include both Dad and Lizzy, or (2) include Dad but not Lizzy, or (3) include Lizzy but not Dad, or (4) exclude both, in which case your subset is empty. Therefore, \( P(A) \) includes all four of those subsets.

Now what's the cardinality of \( P(X) \) for some set \( X \)? That's an interesting question, and one well worth pondering. The answer ripples through the heart of a lot of combinatorics and the binary number system, topics we'll cover later. And the answer is right at our fingertips if we just extrapolate from the previous example. To form a subset of \( X \), we have a choice to either include, or else exclude, each of its elements. So there's two choices for the first element, and then whether we choose to include or exclude that first element, there are two choices for the second, etc. 

So if \( |X| = 2 \) (recall that this notation means \( X \) has two elements) or \( X \) has a cardinality of \( 2^n \), then its power set has \( 2 \times 2 \) members. If \( |X| = 3 \), then its power set has \( 2 \times 2 \times 2 \) members. In general:

\[
|P(X)| = 2^{|X|}.
\]

As a limiting case (and a brain-bender), notice that if \( X \) is the empty set, then \( P(X) \) has one (not zero) members, because there is in fact one subset of the empty set: namely, the empty set itself. So \( |X| = 0 \), and \( |P(X)| = 1 \). And that jives with the above formula.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 35

Context: # 2.12. PARTITIONS

All of these are ways of dividing up the Davies family into groups so that no one is in more than one group, and everyone is in some group. The following is **not** a partition:

- `{ Mom, Lizzy, T.J. }`, and `{ Dad }`

because it leaves out Johnny. This, too, is **not** a partition:

- `{ Dad }`, `{ Mom, T.J. }`, and `{ Johnny, Lizzy, Dad }`

because Dad appears in two of the subsets.

By the way, realize that every set \( S \) together with its (total) complement \( S' \) forms a partition of the entire domain of discourse \( \Omega \). This is because every element either is, or is not, in any given set. The set of males and non-males are a partition of \( \Omega \) because everything is either a male or a non-male, and never both (inanimate objects and other nouns are non-males, just as women are). The set of prime numbers and the set of everything-except-prime-numbers are a partition. The set of underdone cheeseburgers and the set of everything-except-underdone-cheeseburgers form a partition of \( \Omega \). By pure logic, this is true no matter what the set is.

You might wonder why partitions are an important concept. The answer is that they come up quite a bit, and when they do, we can make some important simplifications. Take \( S \), the set of all students at UMW. We can partition it in several different ways. If we divide \( S \) into the set of freshmen, sophomores, juniors, and seniors, we have a partition: every student is one of those grade levels, and no student is more than one. If we group them into in-state and out-of-state students, we again have a partition.

Note that dividing \( S \) into computer science majors and English majors does not give us a partition. For one thing, not everyone is majoring in one of those two subjects. For another, some students are majoring in both.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 38

Context: ```
# CHAPTER 2. SETS

1. **Is Yoda ⊂ J?**  
   No. Yoda isn't even a set, so it can't be a subset of anything.

2. **Is { Yoda } ⊆ J?**  
   Yes. The { Yoda } set that contains only Yoda is in fact a subset of J.

3. **Is { Yoda } ∈ J?**  
   No. Yoda is one of the elements of J, but { Yoda } is not. In other words, J contains Yoda, but J does not contain a set which contains Yoda (nor does it contain any sets at all, in fact).

4. **Is S ⊆ J?**  
   No.

5. **Is G ⊆ F?**  
   Yes, since the two sets are equal.

6. **Is G ⊂ F?**  
   No, since the two sets are equal, so neither is a proper subset of the other.

7. **Is ∅ ⊆ C?**  
   Yes, since the empty set is a subset of every set.

8. **Is ∅ ⊂ C?**  
   Yes, since the empty set is a subset of every set.

9. **Is F ⊆ Ω?**  
   Yes, since every set is a subset of Ω, and F is certainly not equal to Ω.

10. **Is F ⊂ Ω?**  
   Yes and yes. The empty set is an element of X because it's one of the elements, and it's also a subset of X because it's a subset of every set. Hmmm...

11. **Suppose X = { Q, α, { Z } }. Is α ∈ X?**  
   Yes.

12. **Let A = { Macbeth, Hamlet, Othello }. B = { Scrabble, Monopoly, Othello }, and T = { Hamlet, Village, Town }. What's A ∩ B?**  
   { Othello }.

13. **What's A ∪ B?**  
   { Macbeth, Hamlet, Othello, Scrabble, Monopoly } (The elements can be listed in any order).
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 39

Context: # 2.13. EXERCISES

21. What’s \( A \cap B \)?  
   \{ Macbeth, Hamlet \}.

22. What’s \( B \cap T \)?  
   ∅.

23. What’s \( B \cap \overline{T} \)?  
   B. (which is { Scrabble, Monopoly, Othello. })

24. What’s \( A \cup (B \cap T) \)?  
   \{ Hamlet, Othello, Macbeth \}.

25. What’s \( (A \cup B) \cap T \)?  
   \{ Macbeth, Hamlet \}.

26. What’s \( A \setminus B \)?  
   Simply \( T \), since these two sets have nothing in common.

27. What’s \( T' \setminus B \)?  
   \{ (Hamlet, Macbeth), (Hamlet, Hamlet), (Hamlet, Othello), (Village, Macbeth), (Village, Hamlet), (Village, Othello), (Town, Hamlet), (Town, Othello) \}. The order of the ordered pairs within the set is not important; the order of the elements within each ordered pair is important.

28. What’s \( T \cap X \)?  
   0.

29. What’s \( (B \cap B) \times (A \cap T) \)?  
   \{ (Scrabble, Hamlet), (Monopoly, Hamlet), (Othello, Hamlet) \}.

30. What’s \( I \cup B \cup T \)?  
   7.

31. What’s \( I \cap (A \cap T) \)?  
   21. (The first parenthesized expression gives rise to a set with 7 elements, and the second to a set with three elements (B itself). Each element from the first set gets paired with an element from the second, so there are 21 such pairings.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 40

Context: # CHAPTER 2. SETS

### 33. Is A an extensional set, or an intensional set?
The question doesn't make sense. Sets aren't "extensional" or "intensional"; rather, a given set can be described extensionally or intensionally. The description given in item 19 is an extensional one; an intensional description of the same set would be "The Shakespeare tragedies Stephen studied in high school."

### 34. Recall that G was defined as { Matthew, Mark, Luke, John }. Is this a partition of G?
- { Luke, Matthew }
- { John }

No, because the sets are not collectively exhaustive (Mark is missing).

### 35. Is this a partition of G?
- { Mark, Luke }
- { Matthew, Luke }

No, because the sets are neither collectively exhaustive (John is missing) nor mutually exclusive (Luke appears in two of them).

### 36. Is this a partition of G?
- { Matthew, Mark, Luke }
- { John }

Yes. (Trivia: this partitions the elements into the synoptic gospels and the non-synoptic gospels).

### 37. Is this a partition of G?
- { Matthew, Luke }
- { John, Mark }

Yes. (This partitions the elements into the gospels which feature a Christmas story and those that don't).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 45

Context: # 3.2 Defining Relations

which is pronounced "Harry is S-related-to Dr. Pepper." Told you it was awkward.

If we want to draw attention to the fact that (Harry, Mt. Dew) is not in the relation \( S \), we could strike it through to write:

~~Harry S Mt. Dew~~

## 3.2 Defining relations

Just as with sets, we can define a relation extensionally or intensionally. To do it extensionally, it's just like the examples above — we simply list the ordered pairs:

- (Hermione, Mt. Dew)
- (Hermione, Dr. Pepper)
- (Harry, Dr. Pepper)

Most of the time, however, we want a relation to mean something. In other words, it's not just some arbitrary selection of the possible ordered pairs, but rather reflects some larger notion of how the elements of the two sets are related. For example, suppose I wanted to define a relation called "hasTasted" between the sets \( X \) and \( Y \) above. This relation might have the five of the possible six ordered pairs in it:

- (Harry, Dr. Pepper)
- (Ron, Dr. Pepper)
- (Ron, Mt. Dew)
- (Hermione, Dr. Pepper)
- (Hermione, Mt. Dew)

Another way of expressing the same information would be to write:

Harry hasTasted Dr. Pepper  
Harry hasTasted Mt. Dew  
Ron hasTasted Dr. Pepper  
Ron hasTasted Mt. Dew  
Hermione hasTasted Dr. Pepper  
Hermione hasTasted Mt. Dew
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 51

Context: # 3.5. PROPERTIES OF ENDORLEATIONS

It’s not transitive. Remember: to meet any of these properties, they have to fully apply. “Almost” only counts in horseshoes.

Let’s try another example:

- (Ron, Harry)
- (Ron, Ron)
- (Harry, Harry)
- (Hermione, Hermione)
- (Harry, Hermione)
- (Hermione, Harry)

Is this reflexive? Yes. We’ve got all three wizards appearing with themselves. 

Is it symmetric? No, since (Ron, Harry) has no match. 

Is it antisymmetric? No, since (Harry, Hermione) does have a match. 

Is it transitive? No, since the presence of (Ron, Harry) and (Harry, Hermione) implies the necessity of (Ron, Hermione), which doesn’t appear, so no dice.

## Partial Orders and Posets

A couple of other fun terms: an endorelation which is (1) reflexive, (2) antisymmetric, and (3) transitive is called a **partial order**. A set together with a partial order is called a **partially ordered set**, or "poset" for short. The name "partial order" makes sense once you think through an example.

You may have noticed that when dogs meet each other (especially male dogs), they often circle each other and take stock of each other and try to establish dominance through the so-called "alpha dog." This is a pecking order of sorts that many different species establish. Now suppose I have the set D of all dogs, and a relation “IsAtLeastAs-ToughAs” between them. The relation starts off with every reflexive pair in it: (Rex, Rex), (Fido, Fido), etc. This is because obviously every dog is at least as tough as itself. Now every time two dogs x and y encounter each other, they establish dominance through eye contact or physical intimidation, and then one of the following ordered pairs is added to the relation: either (x, y) or (y, x), but never both.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 55

Context: # 3.6 FUNCTIONS

Functions aren’t necessarily numeric. Sometimes we do draw “plots” of sorts, though, like this one:

```
Harry          Mt. Dew
          ↘    
Ron ──────── Dr. Pepper
          ↖    
Hermione
```
*Figure 3.1: A function represented graphically.*

This simply shows which elements of the domain map to which elements of the codomain. The left blob is the domain, the right blob is the codomain, and there’s an arrow representing each mapping.

Now as with relations, functions normally have “meaning.” We could define a function called `firstTasted` that associates each wizard with the soft drink he or she first sampled as a child. We could define another called `faveDrink` that maps each wizard to his or her favorite — presuming that every wizard has a favorite drink in the set (Hermione will have to overlook her iced tea and choose among the options provided). A third function called `wouldChooseWithMexicanFood` provides information about which drink each wizard provides with that type of cuisine. Here are Ron’s values for each of the three functions:

- `firstTasted(Ron) = Mt. Dew`
- `faveDrink(Ron) = Mt. Dew`
- `wouldChooseWithMexicanFood(Ron) = Dr. Pepper`

These values indicate that Mt. Dew was the soda pop that Ron first sipped, and it has been his favorite ever since, although at La Estrellita he prefers a Pepper.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 60

Context: # CHAPTER 3. RELATIONS

own wizard, and no soft drinks (or wizards) are left out. How exciting! This is a perfectly bijective function, also called a **bijection**. Again, the only way to get a bijection is for the domain and codomain to be the same size (although that alone does not guarantee a bijection; witness \( f \), above). Also observe that if they are the same size, then injectivity and surjectivity go hand-in-hand. Violate one, and you’re bound to violate the other. Uphold the one, and you’re bound to uphold the other. There’s a nice, pleasing, symmetrical elegance to the whole idea.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 67

Context: # Chapter 4

## Probability

Probability is the study of **uncertainty**. This may seem like a hopeless endeavor, sort of like knowing the unknowable, but it's not. The study of probability gives us tools for taming the uncertain world we live and program in, and for reasoning about it in a precise and helpful way.

We may not know exactly how long a particular visitor is willing to wait for our webpage to load in their browser, but we can use probability to estimate how much traffic we’ll lose if this takes longer than a certain average duration. We may not know which specific passwords a hacker will try as he attempts to break our security protocol, but we can use probability to estimate how feasible this approach will be for him. We may not know exactly when a certain program will run out of RAM and have to swap its data out to virtual memory, but we can predict how often this is likely to occur — and how painful it will be for us — given a certain system load and user behavior.

The trick is to use the tools we've already built — sets, relations, functions — to characterize and structure our notions of the relative likelihood of various outcomes. Once those underpinnings are secured, a layer of deductive reasoning will help us make good use of that information to begin to predict the future.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 68

Context: # 4.1 Outcomes and events

Since life is uncertain, we don’t know for sure what is going to happen. But let’s start by assuming we know what things might happen. Something that might happen is called an outcome. You can think of this as the result of an experiment if you want to, although normally we won’t be talking about outcomes that we have explicitly manipulated and measured via scientific means. It’s more like we’re just curious how some particular happening is going to turn out, and we’ve identified the different ways it can turn out and called them outcomes.

Now we’ve been using the symbol \( \Omega \) to refer to “the domain of discourse” or “the universal set” or “all the stuff we’re talking about.” We’re going to give it yet another name now: the sample space. \( \Omega \), the sample space, is simply the *set of all possible outcomes*. Any particular outcome — call it \( O \) — is an element of this set, just like in Chapter 1 every conceivable element was a member of the domain of discourse.

If a woman is about to have a baby, we might define \( \Omega \) as \{ boy, girl \}. Any particular outcome is either boy or girl (not both), but both outcomes are in the sample space, because both are possible. If we roll a die, we’d define \( \Omega \) as \{ 1, 2, 3, 4, 5, 6 \}. If we’re interested in motor vehicle safety, we might define \( \Omega \) for a particular road trip as \{ safe, accident \}. The outcomes don’t have to be equally likely; an important point we’ll return to soon.

In probability, we define an event as a *subset of the sample space*. In other words, an event is a group of related outcomes (though an event might contain just one outcome, or even zero). I always thought this was a funny definition for the word “event”: it’s not the first thing that word brings to mind. But it turns out to be a useful concept, because sometimes we’re not interested in any particular outcome necessarily, but rather in whether the outcome — whatever it is — has a certain property. For instance, suppose at the start of some game, my opponent and I each roll the die, agreeing that the highest roller gets to go first. Suppose he rolls a 2. Now it’s my turn. The \( \Omega \) for my die roll is of course \{ 1, 2, 3 \}.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 75

Context: # 4.3. PHILOSOPHICAL INTERLUDE

The idea of how often heads will occur in the long run. But if we flip it a million times and get 500,372 heads, we can confidently say that the probability of getting a head on a single flip is approximately .500.

This much isn’t controversial: it’s more like common sense. But the frequentist philosophy states that this is really the only way that probability can be defined. It’s what probability is: the frequency with which we can expect certain outcomes to occur, based on our observations of their past behavior. Probabilities only make sense for things that are repeatable, and reflect a known, reliable trend in how often they produce certain results. Historical proponents of this philosophy include John Venn, the inventor of the aforementioned Venn diagram, and Ronald Fisher, one of the greatest biologists and statisticians of all time.

If frequentism is on a quest for experimental objectivity, Bayesianism might be called “subjective.” This isn’t to say it’s arbitrary or sloppy. It simply has a different notion of what probability ultimately means. Bayesians interpret probability as a quantitative personal assessment of the likelihood of something happening. They point out that for many (most) events of interest, trials are neither possible nor sensible. Suppose I’m considering asking a girl out to the prom, and I’m trying to estimate how likely it is she’ll go with me. It’s not like I’m going to ask her a hundred times and count how many times she says yes, then divide by 100 to get a probability. There is in fact no way to perform a trial or use past data to guide me, and at any rate she’s only going to say yes or no once. So based on my background knowledge and my assumptions about her, myself, and the world, I form an opinion which could be quantified as a “percent chance.”

Once I’ve formed this opinion (which of course involves guesswork and subjectivity) I can then reason about it mathematically, using all the tools we’ve been developing. Of special interest to Bayesians is the notion of updating probabilities when new information comes to light, a topic we’ll return to in a moment. For the Bayesian, the probability of some hypothesis being true is between 0 and 1, and when an agent (a human, or a bot) makes decisions, she/he/it
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 78

Context: # CHAPTER 4. PROBABILITY

Again, let’s say we had information leaked to us that the winner, whoever she may be, is female. We can now update our estimate:

$$
Pr(R|F) = \frac{Pr(R \cap F)}{Pr(F)} = \frac{Pr(|Kelly|)}{Pr(F)} = Pr\{(Kelly, Fantasia, Carrie)\} = \frac{2}{5} = .4
$$

You see, once we find out that David is no longer a possibility, our only remaining hope for a rock star is Kelly. And she has only 40% of the probability that’s left over. Note that this is a higher chance for her personally—she’s got to be excited by the press leak—but it’s lower for rock stars, of which she is only one (and evidently, not the predicted strongest).

Background knowledge can even peg our probability estimate to an extreme: all the way to 0, or to 1. What’s \(Pr(U|C)\), the probability of an underage winner, given that he/she is a country singer? The intersection of \(U\) and \(C\) is zero, so this makes \(Pr(U|C) = 0\). In words: a country winner eliminates any possibility of an underage winner. And what’s \(Pr(F|U)\), the probability that a woman wins, given that we know the winner to be underage? Well, \(F \cap U\) and \(U\) are the same (check me), so \(Pr(F|U) = \frac{Pr(F \cap U)}{Pr(U)} = \frac{4}{4} = 1\). Therefore, an underage winner guarantees a female winner.

The way I think about conditional probability is this: look at the diagram, consider the events known to have occurred, and then mentally block out everything except that. Once we know the background fact (s), we’re essentially dealing with a restricted world. Take the example of the known female winner. Once we know that event \(F\) in fact occurred, we can visually filter out David, and look at the Bob as thought that were our entire world. In this restricted female-only view, the underage elements comprise a greater percentage of the total than they did before. And half the rock-star elements have now been obscured, leaving only Kelly as the one-of-the-remaining-three.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 85

Context: # 4.6. BAYES' THEOREM

See how that works? If I do have the disease (and there's a 1 in 1,000 chance of that), there's a .99 probability of me testing positive. On the other hand, if I don't have the disease (a 999 in 1,000 chance of that), there's a .01 probability of me testing positive anyway. The sum of those two mutually exclusive probabilities is 0.01098.

Now we can use our Bayes’ Theorem formula to deduce:

\[ 
P(D|T) = \frac{P(T|D) P(D)}{P(T)} 
\]

\[
P(D|T) = \frac{.99 \cdot \frac{1}{1000}}{0.01098} \approx .0902 
\]

Wow. We tested positive on a 99% accurate medical exam, yet we only have about a 9% chance of actually having the disease! Great news for the patient, but a head-scratcher for the math student. How can we understand this? Well, the key is to look back at that Total Probability calculation in equation 4.1. Remember that there were two ways to test positive: one where you had the disease, and one where you didn't. Look at the contribution to the whole that each of those two probabilities produced. The first was 0.00099, and the second was 0.9999, over ten times higher. Why? Simply because the disease is so rare. Think about it: the test fails once every hundred times, but a random person only has the disease once every thousand times. If you test positive, it’s far more likely that the test screwed up than that you actually have the disease, which is rarer than blue moons.

Anyway, all of this about diseases and tests is a side note. The main point is that Bayes' Theorem allows us to recast a search for \(P(X|Y)\) into a search for \(P(Y|X)\), which is often easier to find numbers for.

One of many computer science applications of Bayes' Theorem is in text mining. In this field, we computationally analyze the words in documents in order to automatically classify them or form summaries or conclusions about their contents. One goal might be to identify the true author of a document, given samples of the writing of various suspected authors. Consider the Federalist Papers,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 93

Context: # Chapter 5

## Structures

Much of computer science deals with representing and manipulating information. To do this, people have devised various **structures** for organizing chunks of data in a way that makes it easy to store, search, and retrieve. There’s a whole course in most computer science curricula called “data structures” which covers how to implement these structures in code. In this book, we won’t be talking about the code, but rather the abstract structures themselves. This chapter has a lot of pictures in it, which depict examples of the various structures in a very general way. The concepts here map directly to code when you need to put them into practice.

There are all kinds of data structures — arrays, linked lists, queues, stacks, hashtables, and heaps, to name a few — but they almost all boil down to one of two fundamental kinds of things: graphs, and trees. There are the two structures we’ll focus on in this chapter. A graph is just about the most general structure you can envision: a bunch of scattered data elements that are related to each other in some way. Almost every data structure imaginable can be recast as a type of graph. Trees are sort of a special case of graphs, but also sort of a topic in their own right, kind of like functions were a special type of relation, but also kind of different. A tree can be seen as a type of graph that imposes extra special conditions which give some navigational benefit.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 97

Context: # 5.1. GRAPHS

## Figure 5.2: A directed graph.

```
         Sonny Liston
           /      \
      Muhammad Ali  George Foreman
           \      /
         Joe Frazier
```

## Figure 5.3: A weighted (and undirected) graph.

```
        Washington, DC
              |
             50
              |
        Fredericksburg
             150
              |
             60
              |
        Virginia Beach
              |
             100
              |
          Richmond
```

**adjacent.** If two vertices have an edge between them, they are said to be adjacent.

**connected.** The word connected has two meanings: it applies both to pairs of vertices and to entire graphs.

We say that two vertices are connected if there is at least one path between them. Each vertex is therefore "reachable" from the other. In Figure 5.1, President and actor are connected, but Ford's Theatre and Civil War are not.

"Connected" is also used to describe entire graphs, if every node can be reached from all others. It's easy to see that Fig.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 99

Context: # 5.1. GRAPHS

A graph of dependencies like this must be both **directed** and **acyclic**, or it wouldn’t make sense. Directed, of course, means that task X can require task Y to be completed before it, without the reverse also being true. If they both depended on each other, we’d have an infinite loop, and no brownies could ever get baked! Acyclic means that no kind of cycle can exist in the graph, even one that goes through multiple vertices. Such a cycle would again result in an infinite loop, making the project hopeless. Imagine if there were an arrow from **bake for 30 mins** back to **grease pan** in Figure 5.4. Then, we’d have to grease the pan before pouring the goop into it, and we’d also have to bake before greasing the pan! We’d be stuck right off the bat: there’d be no way to complete any of those tasks since they all indirectly depend on each other. A graph that is both directed and acyclic (and therefore free of these problems) is sometimes called a **DAG** for short.

```
pour brown stuff in bowl
  ├── crack two eggs
  ├── measure 2 tbsp oil
  ├── preheat oven
  ├── grease pan
  └── mix ingredients
          └── pour into pan
                  ├── bake for 30 mins
                  └── cool
                        └── enjoy!
```

*Figure 5.4: A DAG.*
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 102

Context: # CHAPTER 5. STRUCTURES

## Figure 5.7: A graph depicting a symmetric endorelationship.

```
(Harry, Ron)  
(Ron, Harry)  
(Ron, Hermione)  
(Hermione, Ron)  
(Harry, Harry)  
(Neville, Neville)
```

```
    Harry
      |
      |
    Ron
    / \
   /   \
Hermione  Neville
```

Notice how the loops (edges from a node back to itself) in these diagrams represent ordered pairs in which both elements are the same.

Another connection between graphs and sets has to do with partitions. Figure 5.7 was not a connected graph: Neville couldn't be reached from any of the other nodes. Now consider: isn't a graph like this similar in some ways to a partition of \( A \) — namely, this one?

- \( \{ Harry, Ron, Hermione \} \) 
- \( \{ Neville \} \)

We've simply partitioned the elements of \( A \) into the groups that are connected. If you remove the edge between Harry and Ron in that graph, you have:

- \( \{ Harry \} \) 
- \( \{ Ron, Hermione \} \) 
- \( \{ Neville \} \)

Then add one between Hermione and Neville, and you now have: 

- \( \{ Harry \} \)
- \( \{ Ron, Hermione, Neville \} \)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 105

Context: # 5.1. GRAPHS

Now it might have occurred to you that we can run into trouble if we encounter the same node multiple times while we're traversing. This can happen if the graph has a cycle: there will be more than one path to reach some nodes, and we could get stuck in an infinite loop if we're not careful. For this reason, we introduce the concept of marking nodes. This is kind of like leaving a trail of breadcrumbs: if we're ever about to explore a node, but find out it's marked, then we know we've already been there, and it's pointless to search it again.

So there are two things we're going to do to nodes as we search:

- **To mark** a node means to remember that we've already encountered it in the process of our search.
- **To visit** a node means to actually do whatever it is we need to do to the node (call the phone number, examine its name for a pattern match, add the number to our total, whatever).

Now then, Breadth-first traversal (BFT) is an **algorithm**, which is just a step-by-step, reliable procedure that's guaranteed to produce a result. In this case, it's guaranteed to visit every node in the graph that's reachable from the starting node, and not get stuck in any infinite loops in the process. Here it is:

## Breadth-first traversal (BFT)

1. Choose a starting node.
2. Mark it and enqueue it on an empty queue.
3. While the queue is not empty, do these steps:
   a. Dequeue the front node of the queue.  
   b. Visit it.  
   c. Mark and enqueue all of its unmarked adjacent nodes (in any order).
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 108

Context: # CHAPTER 5. STRUCTURES

Start with an empty stack:  
Push a triangle, and we have:  
Push a star, and we have:  
Push a heart, and we have:  
Pop the heart, and we have:  
Push a club, and we have:  
Pop the club, and we have:  
Pop the star, and we have:  
Pop the triangle. We’re empty again:  

Figure 5.10: A stack in action. The horizontal bar marks the bottom of the stack, and the elements are pushed and popped from the top.

The algorithm is almost identical to BFT, except that instead of a queue, we use a stack. A stack is the same as a queue except that instead of putting elements on one end and taking them off the other, you add and remove to the same end. This “end” is called the top of the stack. When we add an element to this end, we say we **push** it on the stack, and when we remove the top element, we say we **pop** it off.

You can think of a stack as...well, a stack, whether of books or cafeteria trays or anything else. You can’t get anything out of the middle of a stack, but you can take items off and put more items on. The first item pushed is always the last one to be popped, and the most recent one pushed is always ready to be popped back off, and so a stack is also sometimes called a “LIFO” (last-in-first-out).

The depth-first traversal algorithm itself looks like déjà vu all over again.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 117

Context: # 5.1. GRAPHS

### Figure 5.13
The stages of Dijkstra's shortest-path algorithm. The “current node” is shown in grey, with visited nodes (whose best paths and shortest distances have been unalterably determined) in black. The sign next to each node shows the tentative shortest distance to that node from Bordeaux.

|       |      |       |      |
|-------|------|-------|------|
|   1   |   2  |   3   |  4   |
| ![Graph 1](img1.png) | ![Graph 2](img2.png) | ![Graph 3](img3.png) | ![Graph 4](img4.png) |
|       |      |       |      |
|   5   |   6  |   7   |      |
| ![Graph 5](img5.png) | ![Graph 6](img6.png) | ![Graph 7](img7.png) |      |
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 118

Context: ```markdown
# Chapter 5: Structures

## Figure 5.14
The stages of Prim's minimal connecting edge set algorithm. Heavy lines indicate edges that have been (irrevocably) added to the set.

| Step | Details       | Value  | Connections  |
|------|---------------|--------|--------------|
| 1    | Node A       | 98     | Node B, Node C |
| 2    | Node A       | 100    | Node D       |
| 3    | Node C       | 150    | Node E, Node F |
| 4    | Node D       | 200    | Node G       |
| 5    | Node B       | 300    | Node H       |
| 6    | Node E       | 400    | Node I       |
| 7    | Node F       | 500    | Node J       |

...
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 123

Context: # 5.2 TREES

parent is A, and A has no parent.

## child

Some nodes have children, which are nodes connected directly below it. A’s children are F and B, C’s are D and E, B’s only child is C, and E has no children.

## sibling

A node with the same parent. E’s sibling is D, B’s is F, and none of the other nodes have siblings.

## ancestor

Your parent, grandparent, great-grandparent, etc., all the way back to the root. B’s only ancestor is A, while E’s ancestors are C, B, and A. Note that F is not C’s ancestor, even though it’s above it on the diagram; there’s no connection from C to F, except back through the root (which doesn’t count).

## descendant

Your children, grandchildren, great-grandchildren, etc., all the way to the leaves. B’s descendants are C, D, and E, while A’s are F, B, C, D, and E.

## leaf

A node with no children. F, D, and E are leaves. Note that in a (very) small tree, the root could itself be a leaf.

## internal node

Any node that’s not a leaf. A, B, and C are the internal nodes in our example.

## depth (of a node)

A node’s depth is the distance (in number of nodes) from it to the root. The root itself has depth zero. In our example, B is of depth 1, E is of depth 3, and A is of depth 0.

## height (of a tree)

A rooted tree’s height is the maximum depth of any of its nodes; i.e., the maximum distance from the root to any node. Our example has a height of 3, since the “deepest” nodes are D and E, each with a depth of 3. A tree with just one node is considered to have a height of 0. Bizarrely, but to be consistent, we’ll say that the empty tree has height -1! Strange, but what else could it be? To say it has height 0 seems inconsistent with a one-node tree also having height 0. At any rate, this won’t come up much.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 126

Context: # Chapter 5. Structures

Subtree. This visits O followed by I (since O has no left subtree either) which finally returns back up the ladder.

It's at this point where it's easy to get lost. We finish visiting I, and then we have to ask "okay, where the heck were we? How did we get here?" The answer is that we had just been at the K node, where we had traversed its (D) subtree. So now what is it time to do? Traverse the right subtree, of course, which is M. This involves visiting M, C, and E (in that order) before returning to the very top, G.

Now we're in the same sort of situation where we could have gotten lost before: we've spent a lot of time in the tangled mess of G's left subtree, and we just have to remember that it's now time to do G's right subtree. Follow this same procedure, and the entire order of visitation ends up being: G, K, D, O, I, M, C, E, H, A, B, F, N, L. (See Figure 5.18 for a visual.)

![Figure 5.18: The order of node visitation in pre-order traversal.](#)

## To traverse a tree post-order, we:
1. Treat the left child and all its descendants as a subtree, and traverse it in its entirety.
2. Do the same with the right child.
3. Visit the root.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 130

Context: # CHAPTER 5. STRUCTURES

## Figure 5.21: A full binary tree

```
       G
     /   \
    K     H
   / \   / \
  D   M A   B
 / \ / \ |   |
C  E F   N   L
   |
   I
```

## Complete binary tree

A complete binary tree is one in which every level has all possible nodes present, except perhaps for the deepest level, which is filled all the way from the left. Figure 5.21 is not complete, but it would be if we fixed it up as in Figure 5.22.

## Figure 5.22: A complete binary tree

```
       G
     /   \
    K     H
   / \   / \
  D   M A   B
 / \   |   |
I   E  L   F
         / \
        N   -
```

Unlike full binary trees, it is always possible to have a complete binary tree no matter how many nodes it contains. You just keep filling in from left to right, level after level.

## Perfect binary tree

Our last special type has a rather audacious title, but a “perfect” tree is simply one that is exactly balanced.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 133

Context: # 5.2. TREES

## Binary Search Trees (BSTs)

Okay, then let’s talk about how to arrange those contents. A binary search tree (BST) is any binary tree that satisfies one additional property: every node is “greater than” all of the nodes in its left subtree, and “less than (or equal to)” all of the nodes in its right subtree. We’ll call this the **BST property**. The phrases “greater than” and “less than” are in quotes here because their meaning is somewhat flexible, depending on what we’re storing in the tree. If we’re storing numbers, we’ll use numerical order. If we’re storing names, we’ll use alphabetical order. Whatever it is we’re storing, we simply need a way to compare two nodes to determine which one “goes before” the other.

An example of a BST containing people is given in Figure 5.24. Imagine that each of these nodes contains a good deal of information about a particular person—an employee record, medical history, account information, what have you. The nodes themselves are indexed by the person’s name, and the nodes are organized according to the BST rule. Mitch comes after Ben/Jessica/Jim and before Randy/Olson/Molly/Xander in alphabetical order, and this ordering relationship between parents and children repeats itself all the way down the tree. (Check it!)

Be careful to observe that the ordering rule applies between a node and **the entire contents of its subtrees**, not merely to its immediate children. This is a rookie mistake that you want to avoid. Your first instinct, when glancing at Figure 5.25, below, is to judge it a BST. It is not a binary search tree, however! Jessica is to the left of Mitch, as she should be, and Nancy is to the right of Jessica, as she should be. It seems to check out. But the problem is that Nancy is a descendant of Mitch’s left subtree, whereas she must properly be placed somewhere in his right subtree. And, this matters. So be sure to check your BSTs all the way up and down.

## The Power of BSTs

All right, so what’s all the buzz about BSTs, anyway? The key insight is to realize that if you’re looking for a node, all you have to do is start at the root and go the height of the tree down, making...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 139

Context: # 5.2. TREES

subtree – namely, Jim – and promote him in place of her. Figure 5.28 shows the result. Note that we replaced her with Jim not because it’s okay to blindly promote her right child, but because Jim had no left descendants, and hence he was the left-most node in her right subtree. (If he had left descendants, promoting him would have been just as wrong as promoting Ben. Instead, we would have gone left from Jim until we couldn’t go left anymore, and promoted that node.)

![Figure 5.28: The BST after removing Jessica correctly.](path_to_image)

As another example, let’s go whole-hog and remove the root node, Mitch. The result is as shown in Figure 5.29. It’s rags-to-riches for Molly: she got promoted from a leaf all the way to the top. Why Molly? Because she was the left-most descendant of Mitch’s right subtree.

To see why this works, just consider that Molly was immediately after Mitch in alphabetical order. The fact that he was a king and she a peasant was misleading. The two of them were actually very close: consecutive, in fact, with in-order traversal. So replacing Mitch with Molly avoids shuffling anybody out of alphabetical order and preserves the all-important BST property.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 144

Context: # CHAPTER 5. STRUCTURES

10. How many vertices and edges are there in the graph below?

   ![Graph](image_reference_placeholder)

   **7 and 10, respectively.**

11. What's the degree of vertex L?  
   **It has an in-degree of 2, and an out-degree of 1.**

12. Is this graph directed?  
   **Yes.**

13. Is this graph connected?  
   **Depends on what we mean. There are two different notions of “connectedness” for directed graphs. One is strongly connected, which means every vertex is reachable from any other by following the arrows in their specified directions. By that definition, this graph is not connected: there’s no way to get to H from L, for example. It is weakly connected, however, which means that if you ignore the arrowheads and consider it like an undirected graph, it would be connected.**

14. Is it a tree?  
   **No. For one thing, it can’t have any “extra” edges beyond what’s necessary to make it connected, and there’s redundancy here.**

15. Is it a DAG?  
   **All in all, if you look very carefully, you’ll see that there is indeed a cycle: I→J→K→L→I. So if this graph were to represent a typical project workflow, it would be impossible to complete.**

16. If we reversed the direction of the I→J edge, would it be a DAG?  
   **Yes. The edges could now be completed in this order: H, G, L, I, M, K, and finally J.**
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 145

Context: # 5.4. EXERCISES

17. If this graph represented an endorelation, how many ordered pairs would it have?  
   **Answer:** 10.

18. Suppose we traversed the graph below in depth-first fashion, starting with node P. In what order would we visit the nodes?

   ![Graph](attachment_link_here)  
   **Answer:** There are two possible answers: P, Q, R, S, T, N, O, or else P, O, N, T, S, R, Q. (The choice just depends on whether we go "left" or "right" initially.) Note in particular that either O or Q is at the very end of the list.

19. Now we traverse the same graph breadth-first fashion, starting with node P. Now in what order would we visit the nodes?

   **Answer:** Again, two possible answers: P, O, Q, N, R, T, S, or else P, Q, R, N, S, T. Note in particular that both O and Q are visited very early.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 147

Context: # 5.4. EXERCISES

25. **Is it a binary search tree?**  
   No. Although nearly every node does satisfy the BST property (all the nodes in its left subtree come before it alphabetically, and all the nodes in its right subtree come after it), there is a single exception: Zoe is in Wash's left subtree, whereas she should be to his right.  
   Many ways; one would be to swap Zoe's and Wash's positions. If we do that, the fixed tree would be:

   ```
       Mal
      /   \
    Jayne  Zoe
    /  \   /
  Inara Kayle
           \
           River
             \
             Simon
               \
               Wash
   ```

26. **How could we fix it?**  
   Take a moment and convince yourself that every node of this new tree does in fact satisfy the BST property. It's not too bad, but it does have one too many levels in it (it has a height of 4, whereas all its nodes would fit in a tree of height 3).

27. **Is the tree balanced?**  
   Many ways; one would be to rotate the River-Simon-Wash subtree so that Simon becomes Zoe's left child. Simon would then be the parent of River (on his left) and Wash (on his right).

28. **How could we make it more balanced?**  
   Suggested approaches include reviewing the structure to ensure the depths of all leaves are as equal as possible or implementing rotations to balance out the heights of subtrees.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 150

Context: # 6.1 The Fundamental Theorem

We start with a basic rule that goes by the audacious name of **The Fundamental Theorem of Counting**. It goes like this:

If a whole can be divided into \( k \) parts, and there’s \( n_i \) choices for the \( i \)-th part, then there’s \( n_1 \times n_2 \times n_3 \times \cdots \times n_k \) ways of doing the whole thing.

## Example

Jane is ordering a new Lamborghini. She has twelve different paint colors to choose from (including Luscious Red and Sassy Yellow), three different interiors (Premium Leather, Bonded Leather, or Vinyl), and three different stereo systems. She must also choose between automatic and manual transmission, and she can get power locks & windows (or not). How many different configurations does Jane have to choose from? 

To put it another way, how many different kinds of cars could come off the line for her?

The key is that every one of her choices is independent of all the others. Choosing an Envious Green exterior doesn't constrain her choice of transmission, stereo, or anything else. So no matter which of the 12 paint colors she chooses, she can independently choose any of the three interiors, and no matter what these first two choices are, she can freely choose any of the stereos, etc. It’s a mix-and-match. Therefore, the answer is:

\[ 
12 \times 3 \times 3 \times 2 = 432 
\]

Here’s an alternate notation you’ll run into for this, by the way:

**How many other "Fundamental Theorems" of math do you know?** Here are a few: the Fundamental Theorem of Arithmetic says that any natural number can be broken down into its prime factors in only one way. The Fundamental Theorem of Algebra says that the highest power of a polynomial is how many roots (zeros) it has. The Fundamental Theorem of Linear Algebra says that the row space and the column space of a matrix have the same dimension. The Fundamental Theorem of Calculus says that integration and differentiation are the inverses of each other.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 154

Context: # CHAPTER 6. COUNTING

or only about .006 times as many as before. Better stick with alphanumeric characters for all seven positions.

## A Simple Trick

Sometimes we have something difficult to count, but we can turn it around in terms of something much easier. Often this involves counting the complement of something, then subtracting from the total.

For instance, suppose a certain website mandated that user passwords be between 6-10 characters in length — every character being an uppercase letter, lowercase letter, digit, or special character (*, #, $, &, or @) — but it also required each password to have at least one digit or special character. How many passwords are possible?

Without the “at least one digit or special character” part, it’s pretty easy: there are 26 + 26 + 10 + 5 = 67 different choices for each character, so we have:

```
67¹⁰ + 67⁹ + 67⁸ + 67⁷ + 67⁶ = 1,850,456,557,795,600,384 strings.
```

But how do we handle the “at least one” part?

One way would be to list all the possible ways of having a password with at least one non-alpha character. The non-alpha could appear in the first position, or the second, or the third, ..., or the tenth, but of course this only works for 10-digit passwords, and in any event it’s not like the other characters couldn’t also be non-alpha. It gets messy really fast.

There’s a simple trick, though, once you realize that it’s easy to count the passwords that don’t satisfy the extra constraint. Ask yourself this question: out of all the possible strings of 6-10 characters, how many of them don’t have at least one non-alpha character? (and are therefore illegal, according to the website rules?)

It turns out that’s the same as asking “how many strings are there with 6-10 alphabetic (only) characters?” which is of course:

```
52⁶ + 52⁷ + 52⁸ + 52⁹ + 52¹⁰ = 147,389,619,103,536,384 (illegal) passwords.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 156

Context: # CHAPTER 6. COUNTING

This pattern comes up so much that mathematicians have established a special notation for it:

\[ n \times (n - 1) \times (n - 2) \cdots \times 1 = n! \; (\text{``n-factorial''}) \]

We say there are “3-factorial” different brushing orders for the Davies kids. For our purposes, the notion of factorial will only apply for integers, so there’s no such thing as 23.46! or π! (In advanced computer science applications, however, mathematicians sometimes do define factorial for non-integers.) We also define 0! to be 1, which might surprise you.

This comes up a heck of a lot. If I give you a jumbled set of letters to unscramble, like “KRBS” (think of the Jumble® word game in the newspaper), how many different unscramblings are there? The answer is 5! or 120, one of which is BRISK. Let's say I shuffle a deck of cards before playing War. How many different games of War are there? The answer is 52!, since any of the cards in the deck might be shuffled on top, then any but that top card could be second, then any of those two could be third, etc. Ten packets arrive near-simultaneously at a network router. How many ways can they be queued up for transmission? 10! ways, just like a larger Davies family.

The factorial function grows really, really fast, by the way, even faster than exponential functions. A five letter word like “BRISK” has 120 permutations, but “AMBIENTDROUSLY” has 87,178,291,200, ten times the population of the earth. The number of ways to shuffle a deck is 

\[ 52! \approx 8.065 \times 10^{67} \]

so I don’t think my boys will end up playing the same War game twice any time soon, nor my wife and I the same bridge hand.

---

**“War”** is a mindless card game which involves no strategy or decision-making on the part of the players. Once you shuffle the initial deck, the entire outcome of the game is fixed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 157

Context: # 6.2. PERMUTATIONS 

## Enumerating permutations

We've discovered that there are 120 permutations of BRISK, but how would we go about listing them all? You can play around with the Davies kids and stumble upon all 6 permutations, but for larger numbers it’s harder. We need a systematic way.

Two of the easiest ways to enumerate permutations involve recursion. Here’s one:

### Algorithm #1 for enumerating permutations

1. Begin with a set of \( n\) objects.
   a. If \( n = 1\), there is only one permutation; namely, the object itself.  
   b. Otherwise, remove one of the objects, and find the permutations of the remaining \( n - 1\) objects. Then, insert the removed object at every possible position, creating another permutation each time.

As always with recursion, solving a bigger problem depends on solving smaller problems. Let’s start with RISK. We’ve already discovered from the toothbrushing example that the permutations of ISK are ISK, IKS, SIK, SKI, KIS, and KSI. So to find the permutations of RISK, we insert an R into each possible location for each of these ISK-permutations. This gives us:

- RISK  
- RISK  
- TRSK  
- TSKR  
- ISRK  
- IKS  
- TRKS  
- TRKS  
- IKRS  
- KIS  
- IKS  
- RSK  
- RSTK  
- ...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 159

Context: ```
# 6.2. PERMUTATIONS

## Example with 3-letter words

| R | S | K |
|---|---|---|
| R | K | S |
| R | S | I |
| I | R | S |
| I | S | K |
| I | K | R |
| S | R | I |
| S | I | K |
| ... | ... | ... |

Then, for the 5-letter word:

| B | R | I | S | K |
|---|---|---|---|---|
| B | R | I | K | S |
| B | S | K | I | R |
| B | K | I | S | R |
| B | K | S | I | R |
| B | I | R | S | K |
| B | I | K | R | S |
| ... | ... | ... | ... | ... |

## Partial permutations

Sometimes we want to count the permutations of a set, but only want to choose some of the items each time, not all of them. For example, consider a golf tournament in which the top ten finishers (out of 45) all receive prize money, with the first-place winner receiving the most, the second-place finisher a lesser amount, and so on down to tenth place, who receives a nominal prize. How many different finishes are possible to the tournament?

In this case, we want to know how many different orderings of golfers there are, but it turns out that past tenth place, we don’t care what order they finished in. All that matters is the first ten places. If the top ten are 1. Tiger, 2. Phil, 3. Lee, 4. Rory, ... and ...
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 167

Context: # 6.4 Summary

Most of the time, counting problems all boil down to a variation of one of the following three basic situations:

- \( n^k \) — this is when we have \( k \) different things, each of which is free to take on one of \( n \) completely independent choices.

- \( \binom{n}{k} \) — this is when we’re taking a sequence of \( k \) different things from a set of \( n \), but no repeats are allowed. (A special case of this is \( n! \), when \( k = n \).)

- \( \binom{n}{k} \) — this is when we’re taking \( k \) different things from a set of \( n \), but the order doesn’t matter.

Sometimes it’s tricky to deduce exactly which of these three situations apply. You have to think carefully about the problem and ask yourself whether repeated values would be allowed and whether it matters what order the values appear in. This is often subtle.

As an example, suppose my friend and I work out at the same gym. This gym has 18 different weight machines to choose from, each of which exercises a different muscle group. Each morning,
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 171

Context: # 6.5. EXERCISES

1. To encourage rivalry and gluttony, we're going to give a special certificate to the child who collects the most candy at the end of the night. And while we're at it, we'll give 2nd-place and 3rd-place certificates as well. How many different ways could our 1st-2nd-3rd contest turn out?  
   **Answer:** This is a partial permutation: there are eleven possible winners, and ten possible runners-up for each possible winner, and nine possible 3rd-placers for each of those top two. The answer is therefore 114, or 900. Wow! I wouldn’t have guessed that high.

2. Finally, what if we want every kid to get a certificate with their name and place-of-finish on it? How many possibilities? (Assume no ties.)  
   **Answer:** This is now a full-blown permutation: 111. It comes to 30,916,800 different orders-of-finish, believe it or not. I told you: this counting stuff can explode fast.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 175

Context: # 7.1. WHAT IS A “NUMBER?”

When you think of a number, I want you to try to erase the sequence of digits from your mind. Think of a number as what is: a **quantity**. Here's what the number seventeen really looks like:

```
   8
 8 8 8
   8
```

It’s just an **amount**. There are more circles in that picture than in some pictures, and less than in others. But in no way is it “two digits,” nor do the particular digits “1” and “7” come into play any more or less than any other digits.

Let’s keep thinking about this. Consider this number, which I’ll label “A”:

(A)

Now let’s add another circle to it, creating a different number I’ll call “B”:

(B)

And finally, we’ll do it one more time to get “C”:

(C)

(Look carefully at those images and convince yourself that I added one circle each time.)

When going from A to B, I added one circle. When going from B to C, I also added one circle. Now I ask you: was going from B to C any more “significant” than going from A to B? Did anything qualitatively different happen?

The answer is obviously no. Adding a circle is adding a circle; there’s nothing more to it than that. But if you had been writing
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 178

Context: ```
# CHAPTER 7. NUMBERS

Of the number. Obviously, 239 is less than 932, so we say that the hundreds place is more significant than the other digits.

All of this probably seems pretty obvious to you. All right then. Let’s use a base other than ten and see how you do. Let’s write out a number in base 5. We have seven symbols at our disposal: 0, 1, 2, 3, 4, 5, and 6. Wait, you ask — why not 7? Because there is no digit for seven in a base 5 system, just like there is no digit for ten in base 10 system. Ten is the point where we need two digits in a decimal system, and analogously, seven is the point where we’ll need two digits in our base 7 system. How will we write the value seven? Just like this: 10. Now stare at those two digits and practice saying “seven” as you look at them. All your life you’ve been trained to say the number “ten” when you see the digits 1 and 0 printed like that. But those two digits only represent the number ten if you’re using a base 10 system. If you’re using a base 3/4 system, “10” is how you write “thirty-four.”

Very well, we have our seven symbols. Now how do we interpret a number like 6153₇? It’s this:

6153₇ = 6 × 7³ + 1 × 7² + 5 × 7¹ + 3 × 7⁰.

That doesn’t look so strange; it’s very parallel to the decimal string we expanded, above. It looks we can interpret when we actually multiply out the place values:

6153₇ = 6 × 343 + 1 × 49 + 5 × 7 + 3 × 1.

So in base 7, we have a “one’s place”, a “seven’s place”, a “forty-nine’s place”, and a “three hundred forty-three’s place.” This seems unbelievably bizarre — how could a number system possibly hold together with such place values? But I’ll bet it wouldn’t look funny at all if we had been born with 7 fingers. Keep in mind that in the equation above, we wrote out the place values as decimal numbers: Had we written them as base-7 numbers (as we certainly would have if base 7 was our natural numbering system), we would have written:

6153₇ = 6 × 100₇ + 1 × 10₇ + 5 × 7₇ + 3 × 1₇.
```
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 185

Context: # 7.4 BINARY (BASE 2)

yet, and so you started with four fingers held up, and counted off `1... 2... 3... 4... 5`, sticking up another finger each time. Then, you looked at your hands, and behold! nine fingers.

We'll do the same thing here: start with the number `D`, and count two additional places: `E... F`. The answer is `F`. That is the number that's two greater than `D`. Lucky for us, it still fits in one digit. So now we have:

```
  48D₁₆
+ 59₂₁₆
-------
  F₉₁₆
```

So far so good. The next pair of digits is `8 + 9`. Here's where you want to be careful. You're liable to look at `8+9` and immediately say `17`! But `8 + 9` is not `17` in hexadecimal. To figure out what it is, we start with the number `8` and count: `9... A... B... C... D... E... F... 10... 11...`. The answer is `11`, which of course is how you write "seventeen" in hex. So just like in grade school, we write down `1` and carry the `1`:

```
  1
  48D₁₆
+ 59₂₁₆
-------
  1F₉₁₆
```

Finally, our last digit is `4 + 5`, plus the carried `1`. We start with four and count off five: `5... 6... 7... 8... 9... 10... 11...`. Then we add the carry, and count `... A`. The answer is `A`, with no carry, and so we have our final answer:

```
  1
  48D₁₆
+ 59₂₁₆
-------
  A1F₉₁₆
```

## 7.4 Binary (base 2)

The other base we commonly use in computer science is base 2, or binary. This is because the basic unit of information in a computer is called a bit, which has only two values, conventionally called either "true" and "false" or "1" and "0". Numbers (as well
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 201

Context: # 7.5 EXERCISES

6. If I told you that 98,243,917,215 mod 7 was equal to 1, would you call me a liar without even having to think too hard?  
   No, you shouldn’t. It turns out that the answer is 3, not 1, but how would you know that without working hard for it?  

7. If I told you that 273,111,999,214 mod 6 was equal to 6, would you call me a liar without even having to think too hard?  
   Yes, you should. Any number mod 6 will be in the range 0 through 5, never 6 or above. (Think in terms of repeatedly taking out groups of six from the big number. The mod is the number of stones you have left when there are no more whole groups of six to take. If towards the end of this process there are six stones left, that’s not a remainder because you can get another whole group!)

8. Are the numbers 18 and 25 equal?  
   Of course not. Don’t waste my time.

9. Are the numbers 18 and 25 congruent mod 7?  
   Yes. If we take groups of 7 out of 18 stones, we’ll get two such groups (a total of 14 stones) and have 4 left over. And then, if we do that same with 25 stones, we’ll get three such groups (a total of 21 stones) and again have 4 left over. So they’re not congruent mod 7.

10. Are the numbers 18 and 25 congruent mod 6?  
    No. If we take groups of 6 out of 18 stones, we’ll get three such groups with nothing left over. But if we start with 25 stones, we’ll take out 4 such groups (for a total of 24 stones) and have one left over. So they’re not congruent mod 6.

11. Are the numbers 617,418 and 617,424 equal?  
    Of course not. Don’t waste my time.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 204

Context: I'm unable to process images directly. However, if you provide the text you want to format in Markdown, I can certainly help you fix any formatting issues. Please paste the text here.

Image Analysis: 

### Image Analysis Report

**Localization and Attribution:**
- The image appears to be a single diagram located at the center of the page.
- It is referred to as **Image 1**.

**Object Detection and Classification:**
- **Objects Detected:**
  - Boxes/rectangles connected by arrows.
  - Icons inside some boxes.

**Scene and Activity Analysis:**
- **Scene Description:**
  - The image is a flowchart depicting a process. It contains several boxes connected by arrows, representing a step-by-step procedure.
- **Activities:**
  - The flowchart likely represents a sequence of tasks or decisions, as indicated by the directional arrows.

**Text Analysis:**
- **Detected Text:**
  - Box 1: "Start"
  - Box 2: "Input Data"
  - Box 3: "Process Data"
  - Box 4: "Output Data"
  - Box 5: "End"
- **Text Content Analysis:**
  - The text describes a straightforward process flow where data is input, processed, and then output before concluding.

**Diagram and Chart Analysis:**
- **Diagram Analysis:**
  - The flowchart is visually structured to guide the reader from the starting point of a process to its conclusion.
  - Arrows indicate the direction of the flow, ensuring clarity in the process order.
- **Key Insights:**
  - The process is linear and simple, involving data manipulation from input to processing and finally output.

**Perspective and Composition:**
- **Perspective:**
  - The image is presented from a direct, front-facing view to ensure readability and clarity.
- **Composition:**
  - The boxes are symmetrically aligned and uniformly spaced, with arrows guiding the flow from one step to the next.

**Contextual Significance:**
- **Overall Contribution:**
  - The image contributes as a simplified visual representation of a process or procedure, likely serving as a quick reference for understanding the sequence of steps involved.

**Tables:**
- **Tabular Data Analysis:**
  - No tables are included in the image.

### Summary

**Image 1** features a flowchart that describes a step-by-step process involving the input, processing, and output of data, concluding with an end step. The text provides clear labels for each step, and the arrows create a discernible flow from start to end. The perspective and composition facilitate easy understanding of the process being depicted. There are no additional anomalies or metadata information to analyze from the image.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 206

Context: # CHAPTER 8. LOGIC

A statement that has a "truth value," which means that it is either true or false. The statement "all plants are living beings" could be a proposition, as could "Barack Obama was the first African-American President" and "Kim Kardashian will play the title role in Thor: Love and Thunder." By contrast, questions like "are you okay?" cannot be propositions, nor can commands like "hurry up and answer already!" or phrases like "Lynn's newborn schnauzer," because they are not statements that can be true or false. (Linguistically speaking, propositions have to be in the indicative mood.)

We normally use capital letters (what else?) to denote propositions, like:

- Let **A** be the proposition that UMW is in Virginia.
- Let **B** be the proposition that the King of England is female.
- Let **C** be the proposition that dogs are carnivores.

Don't forget that a proposition doesn't have to be true in order to be a valid proposition (B is still a proposition, for example). It just matters that it is labeled and that it has the potential to be true or false.

Propositions are considered atomic. This means that they are indivisible: to the logic system itself, or to a computer program, they are simply an opaque chunk of truth (or falsity) called "A" or whatever. When we humans read the description of A, we realize that it has to do with the location of a particular institution of higher education, and with the state of the union that it might reside (or not reside) in. All this is invisible to an artificially intelligent agent, however, which treats "A" as nothing more than a stand-in label for a statement that has no further discernible structure. 

So things are pretty boring so far. We can define and label propositions, but none of them have any connections to the others. We change that by introducing logical operators (also called logical connectives) with which we can build up compound constructions out of multiple propositions. The six connectives we’ll learn are:
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 219

Context: # 8.2. Predicate Logic

which is perfectly true[^1].

You may recall the word "predicate" from your middle school grammar class. Every sentence, remember, has a subject and a predicate. In “Billy jumps,” “Billy” is the subject, and “jumps” the predicate. In “The lonely boy ate spaghetti with gusto,” we have “the lonely boy” as the subject and “ate spaghetti with gusto” as the predicate. Basically, a predicate is anything that can describe or affirm something about a subject. Imagine asserting “JUMPS(Billy)” and “ATESPAGHETTIWITHGUSTO(lonely boy).”

A predicate can have more than one input. Suppose we define the predicate `IsFanOf` as follows:

Let `IsFanOf(x, y)` be the proposition that x digs the music of rock band y.

Then I can assert:

- `IsFanOf(Stephen, Led Zeppelin)`
- `IsFanOf(Rachel, The Beatles)`
- `IsFanOf(Stephen, The Beatles)`
- `¬IsFanOf(Stephen, The Rolling Stones)`

We could even define `TraveledToByModeInYear` with a bunch of inputs:

Let `TraveledToByModeInYear(p, d, m, y)` be the proposition that person p traveled to destination d by mode m in year y.

The following statements are then true:

- `TraveledToByModeInYear(Stephen, Richmond, car, 2017)`

> "By the way, when I say you can give any input at all to a predicate, I mean any individual element from the domain of discourse. I don’t mean that a set of elements can be an input. This limitation is why it’s called ‘first-order’ predicate logic. If you allow sets to be inputs to predicates, it’s called ‘second-order predicate logic’, and can get quite messy." 

[^1]: The footnote text if needed.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 221

Context: # 8.2. PREDICATE LOGIC

This asserts that for every \( x \), \( \text{HasGovernor}(x) \) is true. Actually, this isn’t quite right, for although Michigan and California have governors, mayonnaise does not. To be precise, we should say:

\[
\forall x \in S \, \text{HasGovernor}(x),
\]

where \( S \) is the set of all fifty states in the U.S.

We can use a quantifier for any complex expression, not just a simple predicate. For instance, if \( H \) is the set of all humans, then:

\[
\forall h \in H \, \text{Adult}(h) \oplus \text{Child}(h)
\]

states that every human is either an adult or a child, but not both. (Imagine drawing an arbitrary line at a person’s 18th birthday.) Another (more common) way to write this is to dispense with sets and define another predicate \( \text{Human} \). Then we can say:

\[
\forall h \, \text{Human}(h) \Rightarrow \text{Adult}(h) \oplus \text{Child}(h).
\]

Think this through carefully. We’re now asserting that this expression is true for all objects, whether they be Duchess Kate Middleton, little Prince Louis, or a bowl of oatmeal. To see that it’s true for all three, let’s first be equal to Kate Middleton. We substitute Kate for \( h \) and get:

\[
\text{Human}(\text{Kate}) \Rightarrow \text{Adult}(\text{Kate}) \oplus \text{Child}(\text{Kate})
\]

\[
\text{true} \Rightarrow \text{true} \oplus \text{false}
\]

\[
\text{true} \Rightarrow \text{true}
\]

\[
\text{true} \checkmark
\]

Remember that “implies” (\( \Rightarrow \)) is true as long as the premise (left-hand side) is false and/or the conclusion (right-hand side) is true. In this case, they’re both true, so we have a true result. Something similar happens for Prince Louis:

\[
\text{Human}(\text{Louis}) \Rightarrow \text{Adult}(\text{Louis}) \oplus \text{Child}(\text{Louis})
\]

\[
\text{true} \Rightarrow \text{false} \oplus \text{true}
\]

\[
\text{true} \Rightarrow \text{true}
\]

\[
\text{true} \checkmark
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 224

Context: # CHAPTER 8. LOGIC

## Order matters

When you're facing an intimidating morass of \( \forall \)s and \( \exists \)s and \( \vee \)s and \( \wedge \)s, and God knows what else, it's easy to get lost in the sauce. But you have to be very careful to dissect the expression to find out what it means. Consider this one:

\[
\forall x \in \mathbb{R}, \exists y \in \mathbb{R} \quad y = x + 1.
\] (8.5)

This statement is true. It says that for every single real number (call it \( x \)), it's true that you can find some other number (call it \( y \)) that's one greater than it. If you generate some examples it's easy to see this is true. Suppose we have the real number \( x = 5 \). Is there some other number \( y \) that's equal to \( 2 + x \)? Of course, the number 6. What if \( x = -32.4 \)? Is there a number \( y \) that satisfies this equation? Of course, \( y = -31.4 \). Obviously not matter what number \( x \) we choose, we can find the desired number \( y \) just by adding one. Hence this statement is true for all \( x \), just like it says.

What happens, though, if we innocently switch the order of the quantifiers? Let's try asserting this:

\[
\exists y \in \mathbb{R}, \forall x \in \mathbb{R} \quad y = x + 1.
\] (8.6)

Is this also true? Look carefully. It says “there exists some magic number \( y \) that has the following amazing property: no matter what value of \( x \) you choose, this \( y \) is one greater than \( x \)!” Obviously this is not true. There is no such number \( y \). If I choose \( y = 13 \), that works great as long as I choose \( x = 12 \), but for any other choice of \( x \), it’s dead in the water.

## The value of precision

This fluency with the basic syntax and meaning of predicate logic was our only goal in this chapter. There are all kinds of logical rules that can be applied to predicate logic statements in order to
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 225

Context: # 8.2. Predicate Logic

Deduce further statements, and you’ll learn about them when you study artificial intelligence later on. Most of them are formalized versions of common sense. “If you know A is true, and you know A=B is true, then you can conclude B is true.” Or “if you know X∧Y is false, and then you discover that Y is true, you can then conclude that X is false.” Etc. The power to produce new truth from existing truth is the hallmark of AI systems, and why this stuff really matters.

If you can imagine a program doing this sort of automated reasoning, it will become clear why the precision of something like predicate logic—instead of the sloppiness of English—becomes important. English is a beautiful and poetic language, but its ambiguity is notorious. For example, back in Chapter 3 we used the phrase “some employee belongs to every department” when describing relations. Now consider that English sentence. What does “some employee belongs to every department” actually mean? Does it mean that there is some special employee who happens to hold membership in every department in the company? Or does it mean that no department is empty: all departments have at least one person in them, for crying out loud? The English could mean either.

In predicate logic, we’re either asserting:

- \(\exists x \, \text{Employee}(x) \land \forall y \, \text{BelongsTo}(x, y)\)

or

- \(\forall y \, \text{Employee}(x) \land \text{BelongsTo}(x, y)\)

These are two very different things. A human being would realize that it’s the third one on the speaker’s mind, drawing from a whole range of experience and common sense and context clues. But a bot has available none of these, and so it demands that the language clearly and unambiguously state exactly what’s meant.

English is rife with these ambiguities, especially involving pronouns. 

> “After John hit George he ran away.” What happened? Did John run away after striking George, fearing that George would retaliate? Or did George run away after getting hit, fearing additional abuse? It’s unclear what “he” refers to, so we can’t say from the sentence alone.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 226

Context: # Chapter 8: Logic

Here's a funny one I’ll end with. Consider the sentence **“He made her duck.”** What is intended here? Did some guy reach out with his hand and forcefully push a woman’s head down out of the way of a screaming projectile? Or did he prepare a succulent dish of roasted fowl to celebrate her birthday? Oh, if the computer could only know. If we’d used predicate logic instead of English, it could!
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 229

Context: # 8.3. EXERCISES

28. True or false: ¬∀x Human(x).  
   **True.** This says “it’s not the case that everyone and everything is human.” And that certainly is not the case.

29. True or false: ∀x Human(x).  
   **False.** This says “nothing is human,” which is clearly not true. (Consider yourself as a counterexample).

30. True or false: ∃x ¬Human(x).  
   **True.** This says “there’s at least one thing in the universe which is not human.” (Consider your lunch).

31. True or false: ¬∃x Human(x).  
   **False.** This says “nothing is human,” just like item 29 did.

32. True or false: ∀x Human(x) → Professor(x).  
   **Not even close.** This says “everything in the universe is a human professor.” (Even though I would exist in such a world, what a world, indeed, as it would be).

33. True or false: ∀x Human(x) → Professor(x).  
   **False.** This says “every person is a professor.” (Consider LeBron James.) Keep in mind: “→” and “∧” don’t really play well together.

34. True or false: ∃x Professor(x) → Human(x).  
   **This is technically true, but for a stupid reason, and whoever wrote it almost certainly didn't intend what they wrote.** It says, “there’s at least one thing in the universe which either (a) isn’t a professor, or (b) if it is a professor, is also human.” Keep in mind: “∃” and “→” don’t really play well together. To drill this lesson home, realize that you could substitute almost any predicates for Professor(x) and Human(x) in that statement and it would still be true. (Try swapping out Professor(x) for Condition(x) and Human(x) for AstrologicalSign(x). Now try ∃ EuropeanUnion x: the statement is true. The U.S. is not a member, nor is it an astrological sign, so both sides of the implication are false, and never forget: false = false = true.)
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 234

Context: # CHAPTER 9. PROOF

That proposition B must be false. What else, if anything, can now be safely concluded from this?

It turns out that we can now conclude that F is also false. How do we know this? Here’s how:

1. Fact #4 says that either A or B (or both) is true. But we just discovered that B was false. So if it ain't B, it must be A, and therefore we conclude that A must be true. (For the curious, this rule of common sense is called a “disjunctive syllogism.”)

2. Now if A is true, we know that C must also be true, because fact #1 says that A implies C. So we conclude that C is true. (This one goes by the Latin phrase “modus ponens.”)

3. Fact #2 says \( C \land D \) must be false. But we just found out that C was true, so it must be D that’s false in order to make the conjunction false. So we conclude that D is false. (This is a disjunctive syllogism in disguise, combined with De Morgan’s law.)

4. Finally, fact #3 tells us that if either F were true or E were false, then that would imply that D would be true. But we just found out that D is false. Therefore, neither F nor ¬E can be true. (This step combines “modus tollens” with “disjunction elimination.”) So we conclude that F must be false.

Q.E.D.

(The letters “Q.E.D.” at the end of a proof stand for a Latin phrase meaning, “we just proved what we set out to prove.” It’s kind of a way to flex your muscles as you announce that you’ve done it.)

The things we’re allowed to start with are called axioms (or postulates). An axiom is a presupposition or definition that is given.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 236

Context: # CHAPTER 9. PROOF

## COLD

You start with one word (like **WARM**) and you have to come up with a sequence of words, each of which differs from the previous by only one letter, such that you eventually reach the ending word (like **COLD**). It's sort of like feeling around in the dark:

- **WARM**
- **WART**
- **WALT**
- **WILT**
- **WILD**
- ...  

This attempt seemed promising at first, but now it looks like it's going nowhere. ("**WOLD?**" "**CILD?**" Hmm...) After starting over and playing around with it for a while, you might stumble upon:

- **WARM**
- **WORM**
- **WORD**
- **CORD**
- **COLD**

This turned out to be a pretty direct path: for each step, the letter we changed was exactly what we needed it to be for the target word **COLD**. Sometimes, though, you have to meander away from the target a little bit to find a solution, like going from **BLACK** to **WHITE**:

- **BLACK**
- **CLACK**
- **CRACK**
- **TRACK**
- **TRICK**
- **TRICE**
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 237

Context: # 9.2. TYPES OF PROOF

## TRITE  

## WRITE  

## WHITE  

Here, we had to temporarily change our first letter three different times — two of which seemingly brought us no nearer to WHITE — in order to successfully forge a path through the tangled forest.

Knowing which direction to set out on is a matter of intuition plus trial and error. Given the axioms of any system (whether algebra, predicate logic, sets, etc.), there are an unfathomable number of different ways to proceed. The vast majority of them are bound to lead to dead ends. This is why a valid proof, when it is finished, is often an elegant and beautiful thing. It’s a thin braid of jewels glistening in the midst of a whole lot of mud.

## Indirect proof

Also known as a proof by contradiction or reductio ad absurdum, the indirect proof starts in a completely opposite way. It says, “Okay, I’m trying to prove X. Well, suppose for the sake of argument I assume that the opposite — not X — is true. Where would that lead me?” If you follow all the rules and it leads you to a contradiction, this tells you that the original assumption of ¬X must have been false. And this in turn proves that X must be true.

We do this all the time in our thinking. Say you’re driving down the highway. How do you know that the alternator in your engine is working? A direct proof would require that you open the hood and examine the part, testing to ensure it works properly. An indirect proof simply says, “Well, suppose it weren’t working properly. Then my car engine wouldn’t operate. But here I am, driving down the road, and the engine obviously does operate, so that tells me that the alternator must be working properly.”

One of the most famous indirect proof dates from Euclid’s *Elements* in 300 B.C. It proves that the square root of 2 is an irrational number, a great surprise to mathematicians at the time (most of whom doubted the very existence of irrational numbers). Remember...
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 240

Context: 232  
# CHAPTER 9. PROOF  

“recursion,” because this is exactly what it is. Remember that we discussed recursion in the context of rooted trees (see p.116). A tree can be thought of as a node with several children—each of which are, in turn, trees. Each of them is the root node of a tree comprised of yet smaller trees, and so on and so forth. If you flip back to the left-hand side of Figure 5.16 on p.113, you’ll see that A is the root of one tree, and its two children, F and B, are roots of their own smaller trees in turn. If we were to traverse this tree in (say) pre-order, we’d visit the root, then visit the left and right subtrees in turn, treating each of them as their own tree. In this way we’ve broken up a larger problem (traversing the big tree) into smaller problems (traversing the smaller trees F and B). The A node has very little to do: it just visits itself, then defers all the rest of the work onto its children. This idea of pawning off most of the work onto smaller subproblems that you trust will work is key to the idea of inductive proofs.

Mathematical induction is hard to wrap your head around because it feels like cheating. It seems like you never actually prove anything: you defer all the work to someone else, and then declare victory. But the chain of reasoning, though delicate, is strong as iron.

## Casting the problem in the right form  

Let’s examine that chain. The first thing you have to be able to do is express the thing you’re trying to prove as a predicate about natural numbers. In other words, you need to form a predicate that has one input, which is a natural number. You’re setting yourself up to prove that the predicate is true for all natural numbers. (Or at least, all natural numbers of at least a certain size.)

Suppose I want to prove that in the state of Virginia, all legal drinkers can vote. Then I could say “let VOTE(n) be the proposition that a citizen of age n can vote.”

If I want to prove an algebraic identity, like  

\[
\sum_{i=1}^{n} \frac{i}{n+1} = \frac{n(n+1)}{2}, 
\]  

then I have to figure out which variable is the one that needs to vary across the natural numbers. In this case it’s the \( n \) variable in my equation.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 244

Context: # Example 2

A famous story tells of Carl Friedrich Gauss, perhaps the most brilliant mathematician of all time, getting in trouble one day as a schoolboy. As punishment, he was sentenced to tedious work: adding together all the numbers from 1 to 100. To his teacher's astonishment, he came up with the correct answer in a moment, not because he was quick at adding integers, but because he recognized a trick. The first number on the list (1) and the last (100) add up to 101. So do 3 and 98, and so do 4 and 97, etc., all the way up to 50 and 51. So really what you have here is 50 different sums of 101 each, so the answer is \( 50 \times 101 = 5050 \). In general, if you add the numbers from 1 to \( x \), where \( x \) is any integer at all, you'll get \( \frac{x(x + 1)}{2} \).

Now, use mathematical induction to prove that Gauss was right (i.e., that \( \sum_{i=1}^{x} i = \frac{x(x + 1)}{2} \)) for all numbers \( x \).

First, we have to cast our problem as a predicate about natural numbers. This is easy: say “let \( P(n) \) be the proposition that 

\[
\sum_{i=1}^{n} i = \frac{n(n + 1)}{2} 
\]

Then, we satisfy the requirements of induction:

1. **Base Case**: We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have:

   \[
   \sum_{i=1}^{1} i = \frac{1(1 + 1)}{2}
   \]
   \[
   1 = \frac{1(2)}{2}
   \]
   \[
   1 = 1 \quad \text{✓}
   \]

2. **Inductive Step**: We now must prove that \( P(k) \) implies \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 246

Context: # CHAPTER 9. PROOF

1. **Base Case**. We prove that \( P(1) \) is true simply by plugging it in. Setting \( n = 1 \), we have
   \[
   (ab)^1 = a^1b^1
   \]
   \( ab = ab \) ✓

2. **Inductive Step**. We now must prove that \( P(k) \) = \( P(k + 1) \). Put another way, we assume \( P(k) \) is true, and then use that assumption to prove that \( P(k + 1) \) is also true.

   Let’s be crystal clear where we’re going with this. Assuming \( P(k) \) is true means we can count on the fact that
   \[
   (ab)^k = a^k b^k.
   \]
   What we need to do, then, is prove that \( P(k + 1) \) is true, which amounts to proving that
   \[
   (ab)^{k+1} = a^{k+1} b^{k+1}.
   \]
   Now we know by the very definition of exponents that:
   \[
   (ab)^{k+1} = ab(ab)^k.
   \]
   Adding in our inductive hypothesis then lets us determine:
   \[
   (ab)^{k+1} = ab(ab)^k = ab \cdot a^k \cdot b^k = a^{k+1} \cdot b^{k} \cdot b = a^{k+1} b^{k+1} ✓
   \]

3. **Conclusion**. Therefore, for all \( n \geq 1 \), \( P(n) \) is true.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 252

Context: # 9.4 Final Word

Finding proofs is an art. In some ways, it’s like programming: you have a set of building blocks, each one defined very precisely, and your goal is to figure out how to assemble those blocks into a structure that starts with only axioms and ends with your conclusion. It takes skill, patience, practice, and sometimes a little bit of luck.

Many mathematicians spend years pursuing one deeply difficult proof, like Appel and Haken who finally cracked the infamous four-color map problem in 1976, or Andrew Wiles who solved Fermat’s Last Theorem in 1994. Some famous mathematical properties may never have proofs, such as Christian Goldbach’s 1742 conjecture that every even integer is the sum of two primes; or the most elusive and important question in computing theory: does P=NP? (Put very simply: if you consider the class of problems where it’s easy to verify a solution once you have it, but crazy hard to find it in the first place, is there actually an easy algorithm for finding the solution that we just haven’t figured out yet?) Most computer scientists think “no,” but despite a mind-boggling number of hours invested by the brightest minds in the world, no one has ever been able to prove it one way or the other.

Most practicing computer scientists spend time taking advantage of the known results about mathematical objects and structures, and rarely (if ever) have to construct a watertight proof about them. For the more theoretically-minded student, however, who enjoys probing the basis behind the tools and speculating about additional properties that might exist, devising proofs is an essential skill that can also be very rewarding.
####################
File: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf

Page: 257

Context: 9.4. FINAL WORD
=================

- ordered triples, 15
- org charts, 113
- outcomes, 60, 62
- overflow, 188

P = NP?, 244  
parent (of a node), 114  
partial orders, 43  
partial permutations, 151, 154  
partitions, 26, 71, 94  
Pascal's triangle, 157  
passwords, 146  
paths (in a graph), 88, 113  
perfect binary tree, 122, 239  
permutations, 147  
PINs, 143  
poker, 160  
pop (off a stack), 99  
posts, 43  
post-order traversal, 118  
postulates, 226  
power sets, 24, 36  
pre-order traversal, 117  
predicates, 210, 211, 232  
predicate logic, 210  
premise (of implication), 200  
Prim's algorithm, 107  
Prime, Robert, 107  
prior probability, 68  
probability measures, 61, 63, 65  
product operator (II), 142, 160  
proof, 223  
proof by contradiction, 229  
propositional logic, 197, 225  
propositions, 197, 210  

- psychology, 70, 86
- push (on a stack), 99
- quantifiers, 212, 215
- queue, 95, 97
- quotient, 173, 174

- range (of a function), 48
- rational numbers (ℝ), 17, 24
- reachable, 89
- real numbers (ℝ), 71, 94
- rebalancing (a tree), 132
- recursion, 116, 120, 149, 231
- red-black trees, 133
- reflexive (relation), 40, 43
- relations, finite, 39
- relations, infinite, 39
- remainder, 173, 174
- right child, 116  
- root (of a tree), 112, 114  
- rooted trees, 112, 134  
- Russell's paradox, 15  

- sample space (Ω), 60  
- semantic network, 87  
- set operators, 18  
- set-builder notation, 11  
- sets, 8, 93  
- sets of sets, 15  
- sets, finite, 12  
- sets, fuzzy, 10  
- sets, infinite, 12  
- sibling (of a node), 115  
- signs-magnitude binary numbers, 183, 189  

**Sonic the Hedgehog**, 73  
- southern states, 72  
- spatial positioning, 92, 113
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 1

Context: # A Brief Introduction to Machine Learning for Engineers

(2018), “A Brief Introduction to Machine Learning for Engineers”, Vol. XX, No. XX, pp 1–231. DOI: XXX.

Osvaldo Simeone  
Department of Informatics  
King’s College London  
osvaldo.simeone@kcl.ac.uk
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 2

Context: # Contents

I Basics 5

1 Introduction 6  
   1.1 What is Machine Learning? 6  
   1.2 When to Use Machine Learning? 8  
   1.3 Goals and Outline 11  

2 A Gentle Introduction through Linear Regression 15  
   2.1 Supervised Learning 15  
   2.2 Inference 17  
   2.3 Frequentist Approach 19  
   2.4 Bayesian Approach 36  
   2.5 Minimum Description Length (MDL) 42  
   2.6 Information-Theoretic Metrics 44  
   2.7 Interpretation and Causality 47  
   2.8 Summary 49  

3 Probabilistic Models for Learning 51  
   3.1 Preliminaries 52  
   3.2 The Exponential Family 53  
   3.3 Frequentist Learning 59
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 11

Context: # Part I
## Basics
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 13

Context: # 1.1. What is Machine Learning?

This starts with an in-depth analysis of the problem domain, which culminates with the definition of a mathematical model. The mathematical model is meant to capture the key features of the problem under study and is typically the result of the work of a number of experts. The mathematical model is finally leveraged to derive hand-crafted solutions to the problem.

For instance, consider the problem of defining a chemical process to produce a given molecule. The conventional flow requires chemists to leverage their knowledge of models that predict the outcome of individual chemical reactions, in order to craft a sequence of suitable steps that synthesize the desired molecule. Another example is the design of speech translation or image/video compression algorithms. Both of these tasks involve the definition of models and algorithms by teams of experts, such as linguists, psychologists, and signal processing practitioners, not infrequently during the course of long standardization meetings.

The engineering design flow outlined above may be too costly and inefficient for problems in which faster or less expensive solutions are desirable. The machine learning alternative is to collect large data sets, e.g., of labeled speech, images, or videos, and to use this information to train general-purpose learning machines to carry out the desired task. While the standard engineering flow relies on domain knowledge and on design optimized for the problem at hand, machine learning lets large amounts of data dictate algorithms and solutions. To this end, rather than requiring a precise model of the set-up under study, machine learning requires the specification of an objective, of a model to be trained, and of an optimization technique.

Returning to the first example above, a machine learning approach would proceed by training a general-purpose machine to predict the outcome of known chemical reactions based on a large data set, and then by using the trained algorithm to explore ways to produce more complex molecules. In a similar manner, large data sets of images or videos would be used to train a general-purpose algorithm with the aim of obtaining compressed representations from which the original input can be recovered with some distortion.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 17

Context: # 1.3 Goals and Outline

This monograph considers only passive and offline learning.

## 1.3 Goals and Outline

This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. Unlike other texts that are focused on one particular aspect of the field, an effort has been made here to provide a broad but concise overview in which the main ideas and techniques are systematically presented. Specifically, the material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. A prior exposure to information theory is useful but not required.

Detailed discussions are provided on basic concepts and ideas, including overfitting and generalization, Maximum Likelihood and regularization, and Bayesian inference. The text also endeavors to provide intuitive explanations and pointers to advanced topics and research directions. Sections and subsections containing more advanced material that may be regarded at a first reading are marked with a star (+).

The reader will find here further discussions on computing platform or programming frameworks, such as map-reduce, nor details on specific applications involving large data sets. These can be easily found in a vast and growing body of work. Furthermore, rather than providing exhaustive details on the existing myriad solutions in each specific category, techniques have been selected that are useful to illustrate the most salient aspects. Historical notes have also been provided only for a few selected milestone events.

Finally, the monograph attempts to strike a balance between the algorithmic and theoretical viewpoints. In particular, all learning algorithms are presented in a manner that emphasizes their theoretical foundations while also providing practical insights.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 19

Context: # 1.3 Goals and Outline

Generative Adversarial Networks (GANs) are also introduced.

The fourth part covers more advanced modelling and inference approaches. Chapter 7 provides an introduction to probabilistic graphical models, namely Bayesian Networks (BNs) and Markov Random Fields (MRFs), as means to encode more complex probabilistic dependencies than the models studied in previous chapters. Approximate inference and learning methods are introduced in Chapter 8 by focusing on Monte Carlo (MC) and Variational Inference (VI) techniques. The chapter briefly introduces a unified way techniques such as variational EM, Variational AutoEncoders (VAE), and black-box inference. Some concluding remarks are provided in the last part, consisting of Chapter 9.

We conclude this chapter by emphasizing the importance of probability as a common language for the definition of learning algorithms [34]. The centrality of the probabilistic viewpoint was not always recognized, but has deep historical roots. This is demonstrated by the following two quotes, the first from the first AI textbook published by P. H. Winston in 1977, and the second from an unfinished manuscript by J. von Neumann (see [126, 64] for more information):

“Many ancient Greeks supported Socrates upon that deep, inexplicable thoughts came from the gods. Today’s equivalent to those gods is the erratic, even probabilistic neuron. It is more likely that increased randomness of neural behavior is the problem of the epileptic and the drunk, not the advantage of the brilliant.”

from *Artificial Intelligence*, 1977.

“Of all this will lead to theories of computation which are much less rigidly of an all-or-none nature than past and present formal logic. There are numerous indications to make us believe that this new system of formal logic will move closer to another discipline which has been little linked in the past with logic. This is thermodynamics primarily in the form it was received from Boltzmann.”
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 20

Context: # Introduction

from *The Computer and the Brain*, 1958.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 50

Context: ```
44
# A Gentle Introduction through Linear Regression

Produces a description of approximately 
$$ - \sum_{n=1}^{N} \log{p(y_n | x_n, w_{ML}, \beta_{ML})} $$
bits. This description is, however, not sufficient, since the decoder of the description should also be informed about the parameters $(w_{ML}, \beta_{ML})$.

Using quantization, the parameters can be described by means of a number $C(M)$ of bits that is proportional to the number of parameters, here $M + 2$. Concatenating these bits with the description produced by the ML model yields the overall description length

$$ - \sum_{n=1}^{N} \log{p[y_n | x_n, w_{ML}, \beta_{ML}]} + C(M). \quad (2.39) $$

MDL – in the simplified form discussed here – selects the model order $M$ that minimizes the description length (2.39). Accordingly, the term $C(M)$ acts as a regularizer. The optimal value of $M$ for the MDL criterion is hence the result of the trade-off between the minimization of the overhead $C(M)$, which calls for a small value of $M$, and the minimization of the NLL, which decreases with $M$.

Under some technical assumptions, the overhead term can be often evaluated in the form 
$$ \left( \frac{K}{2} \right) N \log N + c, $$ 
where $K$ is the number of parameters in the model and $c$ is a constant. This expression is not quite useful in practice, but it provides intuition about the mechanism used by MDL to combat overfitting.

## 2.6 Information-Theoretic Metrics

We now provide a brief introduction to information-theoretic metrics by leveraging the example studied in this chapter. As we will see in the following chapters, information-theoretic metrics are used extensively in the definition of learning algorithms. Appendix A provides a detailed introduction to information-theoretic measures in terms of inferential tasks. Here we introduce the key metrics of Kullback-Leibler (KL) divergence and entropy by examining the asymptotic behavior of ML in the regime of large $N$. The case with finite $N$ is covered in Chapter 6 (see Sec. 6.4.3).
```
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 80

Context: # Probabilistic Models for Learning

## 3.9 Summary

In this chapter, we have reviewed an important class of probabilistic models that are widely used as components in learning algorithms for both supervised and unsupervised learning tasks. Among the key properties of members of this class, known as the exponential family, are the simple form taken by the gradient of the log-likelihood (LL), as well as the availability of conjugate priors in the same family for Bayesian inference. 

An extensive list of distributions in the exponential family along with corresponding sufficient statistics, measure functions, log-partition functions, and mappings between natural and mean parameters can be found in [156]. More complex examples include the Restricted Boltzmann Machines (RBMs) to be discussed in Chapter 6 and Chapter 8. It is worth mentioning that there are also distributions not in the exponential family, such as the uniform distribution parameterized by its support. The chapter also covered the important idea of applying exponential models to supervised learning via Generalized Linear Models (GLMs). Energy-based models were finally discussed as an advanced topic.

The next chapter will present various applications of models in the exponential family to classification problems.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 82

Context: # 4
## Classification

The previous chapters have covered important background material on learning and probabilistic models. In this chapter, we use the principles and ideas covered so far to study the supervised learning problem of classification. Classification is arguably the quintessential machine learning problem, with the most advanced state of the art and the most extensive application to problems as varied as email spam detection and medical diagnosis. 

Due to space limitations, this chapter cannot provide an exhaustive review of all existing techniques and latest developments, particularly in the active field of neural network research. For instance, we do not cover decision trees here (see, e.g., [155]). Rather, we will provide a principled taxonomy of approaches, and offer a few representative techniques for each category within a unified framework. We will specifically proceed by first introducing as preliminary material the Stochastic Gradient Descent optimization method. Then, we will discuss deterministic and probabilistic discriminative models, and finally we will cover probabilistic generative models.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 119

Context: # 5
## Statistical Learning Theory*

Statistical learning theory provides a well-established theoretical framework in which to study the trade-off between the number \( N \) of available data points and the generalization performance of a trained machine. The approach formalizes the notions of model capacity, estimation error (or generalization gap), and bias that underlie many of the design choices required by supervised learning, as we have seen in the previous chapters. 

This chapter is of mathematical nature, and it departs from the algorithmic focus of the text so far. While it may be skipped at a first reading, the chapter sheds light on the key empirical observations made in the previous chapters relative to learning in a frequentist setup. It does so by covering the theoretical underpinnings of supervised learning within the classical framework of statistical learning theory. 

To this end, the chapter contains a number of formal statements with proofs. The proofs have been carefully selected in order to highlight and clarify the key theoretical ideas. This chapter follows mostly the treatment in [133].
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 133

Context: # 5.5. Summary

To elaborate, consider a probabilistic model \( \mathcal{H} \) defined as the set of all pmfs \( p(x|\theta) \) parameterized by \( \theta \) in a given set. With some abuse of notation, we take \( \mathcal{H} \) to be also the domain of parameter \( \theta \). To fix the ideas, assume that \( \theta \) takes values over a finite alphabet. We know from Section 2.5 that a distribution \( q(x) \) is associated with a lossless compression scheme that requires around \( -\log q(x) \) bits to describe a value. Furthermore, if we were informed about the true parameter \( \theta \), the minimum average coding length would be the entropy \( H(p(\cdot|\theta)) \), which requires setting \( q(x) = p(x|\theta) \) (see Appendix A).

Assume now that we only know that the parameter \( \theta \) lies in set \( \mathcal{H} \), and hence the true distribution \( p(x|\theta) \) is not known. In this case, we cannot select the true parameter distribution, and we need instead to choose a generally different distribution \( q(x) \) to define a compression scheme. With a given distribution \( q(x) \), the average coding length is given by 

\[
\bar{L}(q(x), \theta) = -\sum_{x} p(x|\theta) \log q(x). 
\]
Therefore, the choice of a generally different distribution \( q(x) \) entails a redundancy of 

\[
\Delta R(q(x), \theta) = -\sum_{x} p(x|\theta) \log \frac{q(x)}{p(x|\theta)} \geq 0 \tag{5.22}
\]
bits.

The redundancy \( \Delta R(q(x), \theta) \) in (5.22) depends on the true value of \( \theta \). Since the latter is not known, this quantity cannot be computed. We can instead obtain a computable metric by maximizing over all values of \( \theta \in \mathcal{H} \), which yields the worst-case redundancy

\[
\Delta R(q(x), \mathcal{H}) = \max_{\theta \in \mathcal{H}} \Delta R(q(x), \theta). \tag{5.23}
\]

This quantity can be minimized over \( q(x) \) yielding the so-called min-max redundancy:

\[
\Delta R(q(x)) = \min_{q} \Delta R(q(x), \mathcal{H}) \tag{5.24}
\]
\[
= \min_{q} \max_{\theta} \sum_{x} p(x|\theta) \log \frac{p(x|\theta)}{q(x)} \tag{5.25}
\]
\[
= \min_{q} \max_{\theta} \sum_{x} p(x|\theta) \log \frac{p(x|\theta)}{q(x)}. \tag{5.26}
\]

The minimum redundancy can be taken as a measure of the capacity of model \( \mathcal{H} \), since a richer model tends to yield a larger \( \Delta R(\mathcal{H}) \). In fact,
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 138

Context: # Unsupervised Learning

may wish to cluster a set \( D \) of text documents according to their topics, by modeling the latter as an unobserved label \( z_n \). Broadly speaking, this requires to group together documents that are similar according to some metric. It is important at this point to emphasize the distinction between classification and clustering: While the former assumes the availability of a labelled set of training examples and evaluates its (generalization) performance on a separate set of unlabelled examples, the latter works with a single, unlabelled, set of examples. The different notation used for the labels \( z_n \) in lieu of \( t_n \) is meant to provide a reminder of this key difference.

## Dimensions reduction and representation:
Given the set \( D \), we would like to represent the data points \( x_n \in D \) in a space of lower dimensionality. This makes it possible to highlight independent explanatory factors, and/or to ease visualization and interpretation [93], e.g., for text analysis via vector embedding (see, e.g., [124]).

## Feature extraction:
Feature extraction is the task of deriving functions of the data points \( x_n \) that provide useful lower-dimensional inputs for tasks such as supervised learning. The extracted features are unobserved, and hence latent, variables. As an example, the hidden layer of a deep neural network extracts features from the data for use by the output layer (see Sec. 4.5).

## Generation of new samples:
The goal here is to train a machine that is able to produce samples that are approximately distributed according to the true distribution \( p(x) \). For example, in computer graphics for filmmaking or gaming, one may want to train a software that is able to produce artistic scenes based on a given description.

The variety of tasks and the difficulty in providing formal definitions, e.g., on the realism of an artificially generated image, may hinder unsupervised learning, at least in its current state, a less formal field than supervised learning. Often, loss criteria in unsupervised learning measure the divergence between the learned model and the empirical data distribution, but there are important exceptions, as we will see.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 175

Context: # 7.1. Introduction

illustrated in Fig. 7.1, where we have considered \( N \) i.i.d. documents. Note that the graph is directed: in this problem, it is sensible to model the document as being caused by the topic, entailing a directed causality relationship. Learnable parameters are represented as dots. BNs are covered in Sec. 7.2.

![](path/to/figure7.2.png)  
**Figure 7.2:** MRF for the image denoising example. Only one image is shown and the learnable parameters are not indicated in order to simplify the illustration.

## Example 7.2  

The second example concerns image denoising using supervised learning. For this task, we wish to learn a joint distribution \( p(x, z | \theta) \) of the noisy image \( x \) and the corresponding desired noiseless image \( z \). We encode the images using a matrix representing the numerical values of the pixels. A structured model in this problem can account for the following reasonable assumptions:  

1. Neighboring pixels of the noiseless image are correlated, while pixels further apart are not directly dependent on one another;  
2. Noise acts independently on each pixel of the noiseless image to generate the noisy image.  

These assumptions are encoded by the MRF shown in Fig. 7.2. Note that this is an undirected model. This choice is justified by the need to capture the mutual correlation among neighboring pixels, which can be described as a directed causality relationship. We will study MRFs in Sec. 7.3.

As suggested by the examples above, structure in probabilistic models can be conveniently represented in the form of graphs. At a fundamental level, structural properties in a probabilistic model amount
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 208

Context: # Part V

## Conclusions

- This section summarizes the key findings of the study.
- Conclusions drawn from the data.
- Implications of the findings for future research.

1. **First conclusion point.**
2. **Second conclusion point.**
3. **Third conclusion point.**

### Recommendations

- **Recommendation 1:** Description of recommendation.
- **Recommendation 2:** Description of recommendation.
- **Recommendation 3:** Description of recommendation.

### References

1. Author, A. (Year). *Title of the work*. Publisher.
2. Author, B. (Year). *Title of the work*. Publisher. 

### Acknowledgements

- Acknowledgement of contributions from individuals or organizations.
####################
File: A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf

Page: 212

Context: # Appendices

## Appendix A: Title

Content of Appendix A.

## Appendix B: Title

Content of Appendix B.

## Appendix C: Title

Content of Appendix C.

## Appendix D: Title

Content of Appendix D.

## References

- Reference 1
- Reference 2
- Reference 3

## Tables

| Header 1 | Header 2 | Header 3 |
|----------|----------|----------|
| Row 1    | Data 1   | Data 2   |
| Row 2    | Data 3   | Data 4   |
| Row 3    | Data 5   | Data 6   |
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 7

Context: # MEANT FOR INDUSTRY AS WELL AS BACKGROUND READING

This book was written during my sabbatical at the Radboud University in Nijmegen (Netherlands). I would like to thank Hans for discussion on intuition. I also thank Prof. Bert Kappen, who leads an excellent group of postdocs and students for his hospitality. Marga, kids, UCI,...

---

There are a few main aspects I want to cover from a personal perspective. Instead of trying to address every aspect of the entire field, I have chosen to present a few popular and perhaps useful tools and approaches. What will (hopefully) be significantly different than most other scientific books is the manner in which I will present these methods. I have always been frustrated by the lack of proper explanation of equations. Many times, I have been staring at a formula without the slightest clue where it came from or how it was derived. Many books also excel in stating facts in an almost encyclopedic style, without providing the proper intuition of the method. This is my primary mission: to write a book that conveys intuition. The first chapter will be devoted to why I think this is important.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 9

Context: # Learning and Intuition

We have all experienced the situation that the solution to a problem presents itself while riding your bike, walking home, “relaxing” in the washroom, waking up in the morning, taking your shower, etc. Importantly, it did not appear while banging your head against the problem in a conscious effort to solve it, staring at the equations on a piece of paper. In fact, I would claim that all my bits and pieces of progress have occurred while taking a break and “relaxing out of the problem.” Greek philosophers walked in circles when thinking about a problem; most of us stare at a computer screen all day. The purpose of this chapter is to make you more aware of where your creative mind is located and to interact with it in a fruitful manner.

My general thesis is that, contrary to popular belief, creative thinking is not performed by conscious thinking. It is rather an interplay between your conscious mind, which prepares the seeds to be planted into the unconscious part of your mind. The unconscious mind will munch on the problem “out of sight” and return promising roads to solutions to the consciousness. This process iterates until the conscious mind decides the problem is sufficiently solved, intractable, or plain dull and moves on to the next. It may be a little unsettling to learn that at least part of your thinking goes on in a part of your mind that seems inaccessible and has a very limited interface with what you think of as yourself. But it is undeniable that it is there, and it is also undeniable that it plays a role in the creative thought process.

To become a creative thinker, one should learn to play this game more effectively. To do so, we should think about the language in which to represent knowledge that is most effective in terms of communication with the unconscious. In other words, what type of “interface” between conscious and unconscious mind should we use? It is probably not a good idea to memorize all the details of a complicated equation or problem. Instead, we should extract the abstract idea and capture the essence of it in a picture. This could be a movie with colors and other elements.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 11

Context: Many people may find this somewhat experimental way to introduce students to new topics counter-productive. Undoubtedly for many it will be. If you feel under-challenged and become bored, I recommend moving on to the more advanced textbooks, of which there are many excellent samples on the market (for a list see [books](#)). But I hope that for most beginning students, this intuitive style of writing may help to gain a deeper understanding of the ideas that I will present in the following. Above all, have fun!
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 14

Context: # CHAPTER 1. DATA AND INFORMATION

## Interpretation
Here we seek to answer questions about the data. For instance, what property of this drug was responsible for its high success rate? Does a security officer at the airport apply racial profiling in deciding whose luggage to check? How many natural groups are there in the data?

## Compression
Here we are interested in compressing the original data, a.k.a. the number of bits needed to represent it. For instance, files in your computer can be "zipped" to a much smaller size by removing much of the redundancy in those files. Also, JPEG and GIF (among others) are compressed representations of the original pixel map.

All of the above objectives depend on the fact that there is structure in the data. If data is completely random, there is nothing to predict, nothing to interpret, and nothing to compress. Hence, all tasks are somehow related to discovering or leveraging this structure. One could say that data is highly redundant and that this redundancy is exactly what makes it interesting. Take the example of natural images. If you are required to predict the color of the pixels neighboring some random pixel in an image, you would be able to do a pretty good job (for instance, 20% may be blue sky and predicting the neighbors of a blue sky pixel is easy). Also, if we would generate images at random, they would not look like natural scenes at all. For one, it wouldn’t contain objects. Only a tiny fraction of all possible images looks "natural," and so the space of natural images is highly structured.

Thus, all of these concepts are intimately related: structure, redundancy, predictability, regularity, interpretability, compressibility. They refer to the "food" for machine learning. Without structure, there is nothing to learn. The same thing is true for human learning. From the day we are born, we start noticing that there is structure in this world. Our survival depends on discovering and recording this structure. If I walk into this brown cylinder with a green canopy, I suddenly stop; it won’t give way. In fact, it damages my body. Perhaps this holds for all these objects. When I cry, my mother suddenly appears. Our game is to predict the future accurately, and we predict it by learning its structure.

## 1.1 Data Representation
What does "data" look like? In other words, what do we download into our computer? Data comes in many shapes and forms; for instance, it could be words from a document or pixels from an image. But it will be useful to convert data into a
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 23

Context: # Chapter 3

## Learning

This chapter is without question the most important one of the book. It concerns the core, almost philosophical question of what learning really is (and what it is not). If you want to remember one thing from this book, you will find it here in this chapter.

Ok, let’s start with an example. Alice has a rather strange ailment. She is not able to recognize objects by their visual appearance. At home, she is doing just fine: her mother explained to Alice for every object in her house what it is and how you use it. When she is home, she recognizes these objects (if they have not been moved too much), but when she enters a new environment she is lost. For example, if she enters a new meeting room, she needs a long time to infer what the chairs and the table are in the room. She has been diagnosed with a severe case of "overfitting." 

What is the matter with Alice? Nothing is wrong with her memory because she remembers the objects once she has seen them. In fact, she has a fantastic memory. She remembers every detail of the objects she has seen. And every time she sees a new object, she reasons that the object in front of her is surely not a chair because it doesn’t have all the features she has seen in earlier chairs. The problem is that Alice cannot generalize the information she has observed from one instance of a visual object category to other, yet unobserved members of the same category. The fact that Alice’s disease is so rare is understandable; there must have been a strong selection pressure against this disease. 

Imagine our ancestors walking through the savanna one million years ago. A lion appears on the scene. Ancestral Alice has seen lions before, but not this particular one, and it does not induce a fear response. Of course, she has no time to infer the possibility that this animal may be dangerous logically. Alice’s contemporaries noticed that the animal was yellow-brown, had manes, etc., and immediately un-
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 26

Context: # CHAPTER 3. LEARNING

Now let’s think for a moment what we really mean with “a model”. A model represents our prior knowledge of the world. It imposes structure that is not necessarily present in the data. We call this the **inductive bias**. Our inductive bias often comes in the form of a parameterized model. That is to say, we define a family of models but let the data determine which of these models is most appropriate. A strong inductive bias means that we don’t leave flexibility in the model for the data to work on. We are so convinced of ourselves that we basically ignore the data. The downside is that if we are creating a “bad bias” towards to wrong model. On the other hand, if we are correct, we can learn the remaining degrees of freedom in our model from very few data-cases. Conversely, we may leave the door open for a huge family of possible models. If we now let the data zoom in on the model that best explains the training data it will overfit to the peculiarities of that data. 

Now imagine you sampled 10 datasets of the same size \(N\) and train these very flexible models separately on each of these datasets (note that in reality you only have access to one such dataset but please play along in this thought experiment). Let’s say we want to determine the value of some parameter \(\theta\). Because the models are so flexible, we can actually model the idiosyncrasies of each dataset. The result is that the value for \(\theta\) is likely to be very different for each dataset. But because we didn’t impose much inductive bias the average of many of such estimates will be about right. We say that the bias is small, but the variance is high. In the case of very restrictive models the opposite happens: the bias is potentially large but the variance small. Note that not only is a large bias bad (for obvious reasons), a large variance is bad as well: because we only have one dataset of size \(N\), our estimate could be very far off simply we were unlucky with the dataset we were given. 

What we should therefore strive for is to inject all our prior knowledge into the learning problem (this makes learning easier) but avoid injecting the wrong prior knowledge. If we don’t trust our prior knowledge we should let the data speak. However, letting the data speak too much might lead to overfitting, so we need to find the boundary between too complex and too simple a model and get its complexity just right. Access to more data means that the data can speak more relative to prior knowledge. That, in a nutshell, is what machine learning is all about.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 27

Context: # 3.1 In a Nutshell

Learning is all about generalizing regularities in the training data to new, yet unobserved data. It is not about remembering the training data. Good generalization means that you need to balance prior knowledge with information from data. Depending on the dataset size, you can entertain more or less complex models. The correct size of the model can be determined by playing a compression game. 

Learning = generalization = abstraction = compression.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 28

Context: # CHAPTER 3. LEARNING

## Introduction

Learning involves acquiring knowledge, skills, attitudes, or competencies. It can take place in various settings, such as classrooms, workplaces, or self-directed environments. This chapter discusses the types and processes of learning.

## Types of Learning

1. **Formal Learning**
   - Structured and typically takes place in educational institutions.
   - Includes degrees, diplomas, and certifications.

2. **Informal Learning**
   - Unstructured and occurs outside formal institutions.
   - Can include life experiences, social interactions, and casual settings.

3. **Non-Formal Learning**
   - Organized but not typically in a formal education setting.
   - Often community-based, such as workshops and training sessions.

## Learning Processes

- **Cognitive Learning**: Involves mental processes and understanding. 
- **Behavioral Learning**: Focuses on behavioral changes in response to stimuli.
- **Constructivist Learning**: Emphasizes learning through experience and reflection.

## Table of Learning Theories

| Theory                    | Key Contributor      | Description                                      |
|--------------------------|----------------------|--------------------------------------------------|
| Behaviorism              | B.F. Skinner         | Learning as a change in behavior due to reinforcement. |
| Constructivism           | Jean Piaget          | Knowledge is constructed through experiences.    |
| Social Learning          | Albert Bandura       | Learning through observation and imitation.      |

## Conclusion

Understanding the different types and processes of learning can help educators and learners optimize educational experiences. Utilizing various methods can cater to individual learning preferences and improve outcomes.

## References

- Smith, J. (2020). *Learning Theories: A Comprehensive Approach*. Educational Press.
- Doe, A. (2019). *Techniques for Effective Learning*. Learning Publishers.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 35

Context: 5.1 The Idea In a Nutshell
==========================

To classify a new data item, you first look for the \( k \) nearest neighbors in feature space and assign it the same label as the majority of these neighbors. 

Because 98 noisy dimensions have been added, this effect is detrimental to the kNN algorithm. Once again, it is very important to choose your initial representation with much care and preprocess the data before you apply the algorithm. In this case, preprocessing takes the form of "feature selection" on which a whole book in itself could be written.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 40

Context: 28  
# CHAPTER 6. THE NATIVE BAYESIAN CLASSIFIER

For ham emails, we compute exactly the same quantity,

$$ P_{ham}(X_i = j) = \frac{ \# \text{ ham emails for which the word } j \text{ was found} }{ \text{total } \# \text{ of ham emails} } $$ (6.5)

$$ = \frac{\sum_{n} \mathbb{I}(X_{n} = j \land Y_{n} = 0)}{ \sum_{n} \mathbb{I}(Y_{n} = 0)} $$ (6.6)

Both these quantities should be computed for all words or phrases (or more generally attributes).

We have now finished the phase where we estimate the model from the data. We will often refer to this phase as "learning" or training a model. The model helps us understand how data was generated in some approximate setting. The next phase is that of prediction or classification of new email.

## 6.3 Class-Prediction for New Instances

New email does not come with a label ham or spam (if it would we could throw spam in the spam-box right away). What we do see are the attributes $\{X_i\}$. Our task is to guess the label based on the model and the measured attributes. The approach we take is simple: calculate whether the email has a higher probability of being generated from the spam or the ham model. For example, because the word "viagra" has a tiny probability of being generated under the ham model it will end up with a higher probability under the spam model. But clearly, all words have a say in this process. It’s like a large committee of experts, one for each word; each member casts a vote and can say things like: "I am 99% certain its spam", or "It’s almost definitely not spam (0.1% spam)". Each of these opinions will be multiplied together to generate a final score. We then figure out whether ham or spam has the highest score.

There is one little practical caveat with this approach, namely that the product of a large number of probabilities, each of which is necessarily smaller than one, very quickly gets so small that your computer can’t handle it. There is an easy fix though. Instead of multiplying probabilities as scores, we use the logarithms of those probabilities and add the logarithms. This is numerically stable and leads to the same conclusion because if \(a > b\) then we also have \(\log(a) > \log(b)\) and vice versa. In equations we compute the score as follows:

$$ S_{spam} = \sum_{i} \log P_{spam}(X_i = e_i) + \log P(spam) $$ (6.7)
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 43

Context: # 6.5 Remarks

One of the main limitations of the NB classifier is that it assumes independence between attributes (This is presumably the reason why we call it the naive Bayesian classifier). This is reflected in the fact that each classifier has an independent vote in the final score. However, imagine that I measure the words, "home" and "mortgage". Observing "mortgage" certainly raises the probability of observing "home". We say that they are positively correlated. It would therefore be more fair if we attributed a smaller weight to "home" if we already observed mortgage because they convey the same thing: this email is about mortgages for your home. One way to obtain a more fair voting scheme is to model these dependencies explicitly. However, this comes at a computational cost (a longer time before you receive your email in your inbox) which may not always be worth the additional accuracy. One should also note that more parameters do not necessarily improve accuracy because too many parameters may lead to overfitting.

# 6.6 The Idea In a Nutshell

Consider Figure ??. We can classify data by building a model of how the data was generated. For NB we first decide whether we will generate a data-item from class \( Y = 0 \) or class \( Y = 1 \). Given that decision we generate the values for \( D \) attributes independently. Each class has a different model for generating attributes. Classification is achieved by computing which model was more likely to generate the new data-point, biasing the outcome towards the class that is expected to generate more data.
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 81

Context: # Chapter 14

## Kernel Canonical Correlation Analysis

Imagine you are given 2 copies of a corpus of documents, one written in English, the other written in German. You may consider an arbitrary representation of the documents, but for definiteness we will use the “vector space” representation where there is an entry for every possible word in the vocabulary and a document is represented by count values for every word, i.e., if the word “the” appeared 12 times and the first word in the vocabulary we have \( X_1(doc) = 12 \), etc.

Let’s say we are interested in extracting low dimensional representations for each document. If we had only one language, we could consider running PCA to extract directions in word space that carry most of the variance. This has the ability to infer semantic relations between the words such as synonymy, because if words tend to co-occur often in documents, i.e., they are highly correlated, they tend to be combined into a single dimension in the new space. These spaces can often be interpreted as topic spaces.

If we have two translations, we can try to find projections of each representation separately such that the projections are maximally correlated. Hopefully, this implies that they represent the same topic in two different languages. In this way we can extract language independent topics.

Let \( x \) be a document in English and \( y \) a document in German. Consider the projections: \( u = a^Tx \) and \( v = b^Ty \). Also assume that the data have zero mean. We now consider the following objective:

\[
\rho = \frac{E[uv]}{\sqrt{E[u^2]E[v^2]}} \tag{14.1}
\]
####################
File: A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf

Page: 93

Context: # Bibliography

1. Author, A. (Year). *Title of the Book*. Publisher.
2. Author, B. (Year). *Title of the Article*. *Journal Name*, Volume(Issue), Page Range. DOI or URL if available.
3. Author, C. (Year). *Title of the Website*. Retrieved from URL.
4. Author, D. (Year). *Title of the Thesis or Dissertation*. University Name. 

- Point 1
- Point 2
- Point 3

## Additional References

| Author            | Title                   | Year | Publisher        |
|-------------------|-------------------------|------|------------------|
| Author, E.        | *Title of Article*      | Year | Publisher Name    |
| Author, F.        | *Title of Conference*   | Year | Conference Name   |

### Notes

- Ensure that all entries follow the same citation style for consistency.
- Check for publication dates and any updates that may be required.
##########

"""QUERY: Summarize in 5 sentences the contents of your context. Speak in Old English manner similar to how Shakespeare writes his sonnets."""

Consider the chat history for relevant information. Use all information included. Use as much tokens as needed. Important: If you find information separated by a | in the context, it is a table formatted in Markdown.Important: Take a look at the QUERY and only the QUERY. If this is vague or unclear, please ignore everything and ask a follow-up question instead!


Final Files Sources: A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 1, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 2, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 3, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 5, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 8, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 11, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 13, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 14, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 15, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 17, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 19, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 23, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 26, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 31, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 33, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 35, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 38, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 39, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 40, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 45, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 51, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 55, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 60, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 67, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 68, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 75, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 78, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 85, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 93, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 97, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 99, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 102, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 105, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 108, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 117, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 118, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 123, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 126, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 130, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 133, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 139, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 144, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 145, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 147, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 150, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 154, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 156, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 157, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 159, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 167, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 171, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 175, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 178, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 185, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 201, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 204, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 206, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 219, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 221, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 224, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 225, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 226, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 229, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 234, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 236, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 237, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 240, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 244, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 246, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 252, A%20Cool%20Brisk%20Walk%20Through%20Discrete%20Mathematics%20-%20Stephen%20Davies%20%28PDF%29.pdf - Page 257, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 1, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 2, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 11, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 13, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 17, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 19, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 20, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 50, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 80, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 82, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 119, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 133, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 138, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 175, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 208, A%20Brief%20Introduction%20to%20Machine%20Learning%20for%20Engineers%20-%20Osvaldo%20Simeone%20%28PDF%29.pdf - Page 212, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 7, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 9, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 11, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 14, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 23, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 26, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 27, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 28, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 35, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 40, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 43, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 81, A%20First%20Encounter%20with%20Machine%20Learning%20-%20Max%20Welling%20%28PDF%29.pdf - Page 93
==================================================
**Elapsed Time: 3.40 seconds**
==================================================


FINAL ANSWER 
Answer: In yon tome, penned by the learned Stephen Davies, a brisk promenade through the realm of discrete mathematics doth unfold, wherein the foundations of sets, functions, and relations are laid bare for the eager mind. The author, with a gentle hand, doth guide the reader through the labyrinthine paths of mathematical concepts, illuminating their applications in the realms of computer science and combinatorial design. Each chapter, a stepping stone, leads to the grand conclusion that such knowledge is not merely academic, but a vital key to unlocking the mysteries of modern science and technology. With a flourish, the text doth invite scholars to embrace the elegance of discrete structures, whilst cautioning against the perils of complexity and overfitting in the pursuit of knowledge. Thus, the reader is beckoned to embark upon this enlightening journey, where the beauty of mathematics intertwines with the art of reasoning and logic.
==================================================
**Elapsed Time: 3.02 seconds**
==================================================