Something that stood out to me was their discussion about "Big Data technologies" (Hadoop, HBase...) and how they "support" data mining techniques. So sometimes I wonder if big data technologies are more for the processing of the data, like the ETL needed for data warehouses - but since it's big, we need those special technologies to process it (and do additional data-sciencey things on it - like calculate a probability of churn, probability of loan default, etc). End results can still be those high-performance in-memory objects that we can slice and dice, just like our data warehouses, if that's how we need to see them.
This is all coming from no practical experience with Hadoop / Big Data, just research so hopefully someone clarifies :)
Get the best books from Hacker News each week
Join 4,500+ subscribers and get the best books mentioned on Hacker News every Thursday.
Something that stood out to me was their discussion about "Big Data technologies" (Hadoop, HBase...) and how they "support" data mining techniques. So sometimes I wonder if big data technologies are more for the processing of the data, like the ETL needed for data warehouses - but since it's big, we need those special technologies to process it (and do additional data-sciencey things on it - like calculate a probability of churn, probability of loan default, etc). End results can still be those high-performance in-memory objects that we can slice and dice, just like our data warehouses, if that's how we need to see them.
This is all coming from no practical experience with Hadoop / Big Data, just research so hopefully someone clarifies :)