Overview
CircleNet Analytics is a big data analytics platform designed to process and analyze social media data at scale using Hadoop MapReduce. Built on top of Hadoop 2.7.7, it provides powerful tools for analyzing user behavior, relationships, and engagement patterns across a social network dataset. The platform processes three core datasets:CircleNetPage
User profiles with 200,000 entries including nicknames, job titles, regions, and favorite hobbies
Follows
20 million follow relationships tracking social connections and timestamps
ActivityLog
10 million user actions including page views, pokes, and interactions
Key features
- MapReduce analytics: Run distributed analytics jobs across large datasets using Hadoop’s parallel processing capabilities
- Optimized implementations: Compare simple vs. optimized MapReduce jobs with combiner support for better performance
- Dockerized environment: Complete Hadoop cluster setup with HDFS, web UIs, and monitoring tools
- Scalable design: Process millions of records efficiently with proper data partitioning and aggregation
Get started
Quickstart
Run your first MapReduce job in 5 minutes
Setup guide
Complete Docker and Hadoop installation instructions
Dataset overview
Learn about the CircleNet data structure
Analytics tasks
Explore all 8 available analytics tasks
What you can analyze
CircleNet Analytics supports eight different analytics tasks:- Task A: Report the frequency of each favorite hobby on CircleNet
- Task B: Find the 10 most popular CircleNetPages based on activity
- Task C: Find all users whose hobby matches a specific interest
- Task D: Compute the popularity factor (follower count) for each page owner
- Task E: Determine user favorites by analyzing access patterns
- Task F: Report owners more popular than the average user
- Task G: Identify outdated pages with no activity in 90 days
- Task H: Find users who follow someone in their region but aren’t followed back
Architecture
The platform runs on a containerized Hadoop cluster with the following components:- Hadoop HDFS: Distributed file system for storing datasets
- MapReduce Engine: Parallel processing framework for analytics
- NameNode Web UI: Monitor HDFS at
http://localhost:3002 - Job Tracker: Track MapReduce job progress and performance
All analytics tasks include both simple and optimized implementations, allowing you to compare performance and understand MapReduce optimization techniques.
Next steps
Set up your environment
Follow the setup guide to configure Docker, Hadoop, and HDFS
Run your first job
Complete the quickstart to analyze hobby frequencies