# NumPy — The Fundamental Package for Scientific Computing

> NumPy is the foundation of the Python scientific computing ecosystem. It provides high-performance multidimensional arrays, mathematical functions, linear algebra, random number generation, and Fourier transforms — powering pandas, scikit-learn, TensorFlow, and more.

## Install

Save in your project root:

# NumPy — The Fundamental Package for Scientific Computing

## Quick Use
```bash
# Install NumPy
pip install numpy

# Quick demo
python3 -c "
import numpy as np

# Create arrays
a = np.array([1, 2, 3, 4, 5])
b = np.random.randn(3, 4)  # 3x4 random matrix

print(f'Mean: {b.mean():.3f}, Std: {b.std():.3f}')
print(f'Matrix shape: {b.shape}')
print(f'Dot product: {np.dot(a[:3], a[2:])}')
"
```

## Introduction
NumPy (Numerical Python) is the bedrock of the entire Python data science and machine learning ecosystem. Every major scientific Python library — pandas, scikit-learn, TensorFlow, PyTorch, SciPy, matplotlib — is built on top of NumPy arrays. It provides C-speed array operations from Python, making numerical computation 10-100x faster than pure Python lists.

With over 32,000 GitHub stars and a history spanning 20+ years (from Numeric, its predecessor), NumPy is one of the most fundamental open-source libraries in computing. It defines the array computing standard that the entire ecosystem builds upon.

## What NumPy Does
NumPy provides the ndarray (n-dimensional array) object and a comprehensive collection of mathematical functions that operate on these arrays. Instead of writing Python loops to process data element-by-element, NumPy operations work on entire arrays at once (vectorization), leveraging optimized C and Fortran code under the hood.

## Architecture Overview
```
[Python Code]
np.dot(A, B)
np.linalg.solve(A, b)
        |
   [NumPy Python API]
   Array creation, indexing,
   broadcasting rules
        |
   [NumPy C Core]
   ndarray memory layout
   (contiguous, strided)
        |
+-------+-------+
|       |       |
[BLAS/  [C      [ufunc]
LAPACK] loops]  vectorized
Linear  Element element-wise
algebra -wise   operations
(OpenBLAS, MKL)
        |
[Hardware: CPU SIMD instructions]
```

## Self-Hosting & Configuration
```python
import numpy as np

# Array creation
zeros = np.zeros((3, 4))            # 3x4 zero matrix
ones = np.ones((2, 3), dtype=np.float32)
arange = np.arange(0, 10, 0.5)      # [0, 0.5, 1.0, ..., 9.5]
linspace = np.linspace(0, 1, 100)   # 100 points from 0 to 1

# Vectorized operations (no loops needed)
a = np.random.randn(1000000)
b = np.random.randn(1000000)
c = a * b + np.sin(a)  # operates on entire arrays at C speed

# Linear algebra
A = np.random.randn(100, 100)
b = np.random.randn(100)
x = np.linalg.solve(A, b)  # solve Ax = b
eigenvalues = np.linalg.eigvals(A)

# Broadcasting
matrix = np.random.randn(5, 3)  # 5x3
row_means = matrix.mean(axis=0)  # shape (3,)
normalized = matrix - row_means  # auto-broadcasts

# Fancy indexing
mask = a > 0
positive_values = a[mask]
top_10_indices = np.argsort(a)[-10:]
```

## Key Features
- **ndarray** — fast N-dimensional array object with C-backed memory
- **Vectorization** — operate on entire arrays without Python loops
- **Broadcasting** — automatically align arrays of different shapes
- **Linear Algebra** — matrix operations via BLAS/LAPACK (linalg module)
- **Random Number Generation** — comprehensive random sampling (default_rng)
- **FFT** — Fast Fourier Transform for signal processing
- **Fancy Indexing** — boolean and integer array indexing
- **Memory Efficiency** — typed arrays use far less memory than Python lists

## Comparison with Similar Tools
| Feature | NumPy | PyTorch Tensors | JAX | CuPy | Dask Array |
|---|---|---|---|---|---|
| Primary Use | General Computation | Deep Learning | Research | GPU Computing | Distributed |
| GPU Support | No | Yes | Yes | Yes (CUDA) | Optional |
| Auto-Differentiation | No | Yes (autograd) | Yes | No | No |
| API Compatibility | Standard | NumPy-like | NumPy-like | NumPy-like | NumPy-like |
| JIT Compilation | No | torch.compile | Yes (XLA) | No | No |
| Larger-than-RAM | No | No | No | No | Yes |
| Ecosystem Role | Foundation | DL Framework | DL Research | GPU NumPy | Parallel NumPy |

## FAQ
**Q: Why is NumPy faster than Python lists?**
A: NumPy arrays store elements in contiguous memory with a fixed type (e.g., float64), enabling SIMD CPU instructions and cache-friendly access patterns. Python lists store pointers to scattered objects with type checking overhead.

**Q: What is broadcasting?**
A: Broadcasting lets NumPy operate on arrays with different shapes by automatically expanding the smaller array. For example, adding a shape (3,) array to a (5, 3) matrix adds the row to each of the 5 rows.

**Q: Should I learn NumPy or pandas first?**
A: Learn NumPy basics first — it takes a few hours. pandas builds on NumPy and is what you will use day-to-day for data analysis. Understanding NumPy arrays helps you use pandas more effectively.

**Q: How do I speed up NumPy code?**
A: First, vectorize (replace loops with array operations). Then consider Numba (@jit decorator) for remaining loops, or CuPy for GPU acceleration. Ensure NumPy is linked to optimized BLAS (OpenBLAS or MKL).

## Sources
- GitHub: https://github.com/numpy/numpy
- Documentation: https://numpy.org
- Originally created by Travis Oliphant
- License: BSD 3-Clause

---
Source: https://tokrepo.com/en/workflows/1066ae73-366d-11f1-9bc6-00163e2b0d79
Author: AI Open Source