📜  小数据和大数据的区别

📅  最后修改于: 2021-09-11 04:00:53             🧑  作者: Mango

小数据:可以定义为能够影响当前决策的小数据集。当前正在进行且其数据可以累积在 Excel 文件中的任何内容。小数据也有助于决策,但并不旨在对业务产生很大的影响,而是在短时间内小数据可以描述为能够对当前决策产生影响的小数据集。几乎所有正在进行的工作及其数据都可以在 Excel 文件中获取。小数据在决策中也很有用,但并不打算对业务产生大的影响,而是在短时间内产生影响。
简而言之,足够简单以供人类理解的数据,其数量和结构使其易于访问、简洁和可行,称为小数据。

大数据:它可以表示为大量结构化和非结构化数据。存储的数据量是巨大的。因此,分析师必须彻底挖掘整个事情,使其与做出正确的业务决策相关且有用。
简而言之,传统数据处理技术无法管理的真正庞大而复杂的数据集被称为大数据。

大数据与小数据

下表列出了小数据和大数据之间的差异:

Feature Smalll Data Big Data
Technology Traditional Modern
Collection Generally, it is obtained in an organized manner than is inserted into the database The Big Data collection is done by using pipelines having queues like AWS Kinesis or Google Pub / Sub to balance high-speed data
Volume Data in the range of tens or hundreds of Gigabytes Size of Data is more than Terabytes
Analysis Areas Data marts(Analysts) Clusters(Data Scientists), Data marts(Analysts)
Quality Contains less noise as data is less collected in a controlled manner Usually, the quality of data is not guaranteed
Processing It requires batch-oriented processing pipelines It has both batch and stream processing pipelines
Database SQL NoSQL
Velocity A regulated and constant flow of data, data aggregation is slow Data arrives at extremely high speeds, large volumes of data aggregation in a short time
Structure Structured data in tabular format with fixed schema(Relational) Numerous variety of data set including tabular data, text, audio, images, video, logs, JSON etc.(Non Relational)
Scalability They are usually vertically scaled They are mostly based on horizontally scaling architectures, which gives more versatility at a lower cost
Query Language only Sequel Python, R, Java, Sequel
Hardware A single server is sufficient Requires more than one server
Value Business Intelligence, analysis and reporting Complex data mining techniques for pattern finding, recommendation, prediction etc.
Optimization Data can be optimized manually(human powered) Requires machine learning techniques for data optimization
Storage Storage within enterprises, local servers etc. Usually requires distributed storage systems on cloud or in external file systems
People Data Analysts, Database Administrators and Data Engineers Data Scientists, Data Analysts, Database Administrators and Data Engineers
Security Security practices for Small Data include user privileges, data encryption, hashing, etc. Securing Big Data systems are much more complicated. Best security practices include data encryption, cluster network isolation, strong access control protocols etc.
Nomenclature Database, Data Warehouse, Data Mart Data Lake
Infrastructure Predictable resource allocation, mostly vertically scalable hardware. More agile infrastructure with horizontally scalable hardware