# NumPy — The Fundamental Package for Scientific Computing > NumPy is the foundation of the Python scientific computing ecosystem. It provides high-performance multidimensional arrays, mathematical functions, linear algebra, random number generation, and Fourier transforms — powering pandas, scikit-learn, TensorFlow, and more. ## Install Save in your project root: # NumPy — The Fundamental Package for Scientific Computing ## Quick Use ```bash # Install NumPy pip install numpy # Quick demo python3 -c " import numpy as np # Create arrays a = np.array([1, 2, 3, 4, 5]) b = np.random.randn(3, 4) # 3x4 random matrix print(f'Mean: {b.mean():.3f}, Std: {b.std():.3f}') print(f'Matrix shape: {b.shape}') print(f'Dot product: {np.dot(a[:3], a[2:])}') " ``` ## Introduction NumPy (Numerical Python) is the bedrock of the entire Python data science and machine learning ecosystem. Every major scientific Python library — pandas, scikit-learn, TensorFlow, PyTorch, SciPy, matplotlib — is built on top of NumPy arrays. It provides C-speed array operations from Python, making numerical computation 10-100x faster than pure Python lists. With over 32,000 GitHub stars and a history spanning 20+ years (from Numeric, its predecessor), NumPy is one of the most fundamental open-source libraries in computing. It defines the array computing standard that the entire ecosystem builds upon. ## What NumPy Does NumPy provides the ndarray (n-dimensional array) object and a comprehensive collection of mathematical functions that operate on these arrays. Instead of writing Python loops to process data element-by-element, NumPy operations work on entire arrays at once (vectorization), leveraging optimized C and Fortran code under the hood. ## Architecture Overview ``` [Python Code] np.dot(A, B) np.linalg.solve(A, b) | [NumPy Python API] Array creation, indexing, broadcasting rules | [NumPy C Core] ndarray memory layout (contiguous, strided) | +-------+-------+ | | | [BLAS/ [C [ufunc] LAPACK] loops] vectorized Linear Element element-wise algebra -wise operations (OpenBLAS, MKL) | [Hardware: CPU SIMD instructions] ``` ## Self-Hosting & Configuration ```python import numpy as np # Array creation zeros = np.zeros((3, 4)) # 3x4 zero matrix ones = np.ones((2, 3), dtype=np.float32) arange = np.arange(0, 10, 0.5) # [0, 0.5, 1.0, ..., 9.5] linspace = np.linspace(0, 1, 100) # 100 points from 0 to 1 # Vectorized operations (no loops needed) a = np.random.randn(1000000) b = np.random.randn(1000000) c = a * b + np.sin(a) # operates on entire arrays at C speed # Linear algebra A = np.random.randn(100, 100) b = np.random.randn(100) x = np.linalg.solve(A, b) # solve Ax = b eigenvalues = np.linalg.eigvals(A) # Broadcasting matrix = np.random.randn(5, 3) # 5x3 row_means = matrix.mean(axis=0) # shape (3,) normalized = matrix - row_means # auto-broadcasts # Fancy indexing mask = a > 0 positive_values = a[mask] top_10_indices = np.argsort(a)[-10:] ``` ## Key Features - **ndarray** — fast N-dimensional array object with C-backed memory - **Vectorization** — operate on entire arrays without Python loops - **Broadcasting** — automatically align arrays of different shapes - **Linear Algebra** — matrix operations via BLAS/LAPACK (linalg module) - **Random Number Generation** — comprehensive random sampling (default_rng) - **FFT** — Fast Fourier Transform for signal processing - **Fancy Indexing** — boolean and integer array indexing - **Memory Efficiency** — typed arrays use far less memory than Python lists ## Comparison with Similar Tools | Feature | NumPy | PyTorch Tensors | JAX | CuPy | Dask Array | |---|---|---|---|---|---| | Primary Use | General Computation | Deep Learning | Research | GPU Computing | Distributed | | GPU Support | No | Yes | Yes | Yes (CUDA) | Optional | | Auto-Differentiation | No | Yes (autograd) | Yes | No | No | | API Compatibility | Standard | NumPy-like | NumPy-like | NumPy-like | NumPy-like | | JIT Compilation | No | torch.compile | Yes (XLA) | No | No | | Larger-than-RAM | No | No | No | No | Yes | | Ecosystem Role | Foundation | DL Framework | DL Research | GPU NumPy | Parallel NumPy | ## FAQ **Q: Why is NumPy faster than Python lists?** A: NumPy arrays store elements in contiguous memory with a fixed type (e.g., float64), enabling SIMD CPU instructions and cache-friendly access patterns. Python lists store pointers to scattered objects with type checking overhead. **Q: What is broadcasting?** A: Broadcasting lets NumPy operate on arrays with different shapes by automatically expanding the smaller array. For example, adding a shape (3,) array to a (5, 3) matrix adds the row to each of the 5 rows. **Q: Should I learn NumPy or pandas first?** A: Learn NumPy basics first — it takes a few hours. pandas builds on NumPy and is what you will use day-to-day for data analysis. Understanding NumPy arrays helps you use pandas more effectively. **Q: How do I speed up NumPy code?** A: First, vectorize (replace loops with array operations). Then consider Numba (@jit decorator) for remaining loops, or CuPy for GPU acceleration. Ensure NumPy is linked to optimized BLAS (OpenBLAS or MKL). ## Sources - GitHub: https://github.com/numpy/numpy - Documentation: https://numpy.org - Originally created by Travis Oliphant - License: BSD 3-Clause --- Source: https://tokrepo.com/en/workflows/1066ae73-366d-11f1-9bc6-00163e2b0d79 Author: AI Open Source