Mathematics Terminologies In ML

6 min readJun 12, 2021

Derivatives-Derivatives are defined as the varying rate of change of a function with respect to an independent variable. The derivative is primarily used when there is some varying quantity, and the rate of change is not constant. The derivative is used to measure the sensitivity of one variable (dependent variable) with respect to another variable (independent variable). In this article, we are going to discuss what are derivatives, the definition of derivatives Math, limits and derivatives in detail.

The formula for the derivative is mentioned below -:

The 4 expected things we need to find in order to get the derivatives are -:

We need to start by choosing an interval.
The second is finding a change.
Third is to find the rate of change.
Considering the given limits and thus moving forward with the query.

Slope

It is a mathematical quantity that determines both the direction as well as steepness of the line. It is generally denoted by letter m. The steepness, incline, or grade of a line is measured by the absolute value of the slope. A slope with a greater absolute value indicates a steeper line. The direction of a line is either increasing, decreasing, horizontal or vertical.

A line is increasing if it goes up from left to right. The slope is positive, i.e. m>0.
A line is decreasing if it goes down from left to right. The slope is negative, i.e. m<0.

Below is the formula for slope -:

Chain Rule

The chain rule states that the derivative of f(g(x)) is f’(g(x))⋅g’(x). In other words, it helps us differentiate *composite functions

Partial Derivative

a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary).

Below is the mentioned formula -:

Definite Integral

The definite integral is defined to be exactly the limit and summation that we looked at in the last section to find the net area between a function and the x -axis. Also note that the notation for the definite integral is very similar to the notation for an indefinite integral.

Daily Life uses of Calculus -: Astronaut. Aerospace engineer. Mathematician. Software developer. Postsecondary teacher. Economist. Chemical engineer. Operations research analyst.

Linear Algebra

Vectors — A mathematical measurement or a quantity that has both size and direction

Daily Life Examples

Figuring out the direction of rain and holding your umbrella in that direction.
To move an object in a particular direction, we will have to apply requisite force in that specific direction.

Adding Two Vectors

The sum, u + v, of two vectors, u and v, is constructed by placing u, at some arbitrary location, and then placing v such that v’s tail point coincides with u’s tip point, and u + v is the vector that starts at u’s tail point, and ends at v’s tip point.

Dot Product

We can calculate the Dot Product of two vectors this way:

dot product magnitudes and angle a · b = |a| × |b| × cos(θ)

Where: |a| is the magnitude (length) of vector a |b| is the magnitude (length) of vector b θ is the angle between a and b

Cross Product

We can calculate the Cross Product this way: cross product with angle and unit vector a × b = |a| |b| sin(θ) n

|a| is the magnitude (length) of vector a |b| is the magnitude (length) of vector b θ is the angle between a and b n is the unit vector at right angles to both a and b So the length is: the length of a times the length of b times the sine of the angle between a and b,

Then we multiply by the vector n so it heads in the correct direction (at right angles to both a and b).

Vector Projection

magine it’s a clear day and the sun is shining down upon the Earth.

Let’s pretend that the line containing vector v is the ground. Let’s pretend that vector u is a stick with one endpoint on the ground and one endpoint in the air.

Since the sun is shining brightly, vector u would therefore cast a shadow on the ground, no?

The projection of u onto v is another vector that is parallel to v and has a length equal to what vector u’s shadow would be (if it were cast onto the ground).

Vector Norms

L1 norm It is defined as the sum of magnitudes of each component a = ( a1 , a2 , a3 )

L1 norm of vector a = |a1| + |a2| + |a3|

L2 norm It is defined as the square root of sum of squares of each component L2 norm of vector a = √( a12 + a22 + a32 )

Matrices

In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns.

Determinant — The determinant of a matrix is a special number that can be calculated from a square matrix.

A Matrix is an array of numbers:

A Matrix A Matrix (This one has 2 Rows and 2 Columns)

The determinant of that matrix is (calculations are explained later):

3×6 − 8×4 = 18 − 32 = −14

Eigen values and Vectors — Eigenvalues and eigenvectors feature prominently in the analysis of linear transformations In essence, an eigenvector v of a linear transformation T is a nonzero vector that, when T is applied to it, does not change direction. Applying T to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue. This condition can be written as the equation

referred to as the eigenvalue equation or eigenequation. In general, λ may be any scalar. For example, λ may be negative, in which case the eigenvector reverses direction as part of the scaling, or it may be zero or complex.

Gradient Descent

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent.

Single variable — Suppose we have a single variable and differentiable function f(x). Given a initial value x1 the gradient descent describes the steps required to coverge towards a local minimum x∗ such that f′(x∗)=0. In the above section we determined that x2=x1−λf′(x1), where λ is the learning rate parameter. This formula can be generalized to xt+1=xt−λf′(xt), for iteration t. Given initial value x1 and large number of iterations T, this algorithm will generate x1,x2,…,xT, where xT≈x∗. That is xt+1 converges towards x∗ as t gets very large. Notice that at convergence f′(xt)≈0 and hence |xt+1−xt|≈0. Therefore a common stopping criteria for gradient descent is to iterate until |xt+1−xt| is a very small number. For example we can keep iterating gradient descent until |xt+1−xt|<0.001.

Multivariable — the target multivariate function how gradient descent works with it Remember, gradient descent is an algorithm to find a minimum of a function. Therefore, our aim here is to find the minimum of a function with more than one variable.

Mathematics Terminologies In ML

Written by Shubham Kumar