Welcome to CircleNet Analytics

Overview

CircleNet Analytics is a big data analytics platform designed to process and analyze social media data at scale using Hadoop MapReduce. Built on top of Hadoop 2.7.7, it provides powerful tools for analyzing user behavior, relationships, and engagement patterns across a social network dataset.

The platform processes three core datasets:

CircleNetPage

User profiles with 200,000 entries including nicknames, job titles, regions, and favorite hobbies

Follows

20 million follow relationships tracking social connections and timestamps

ActivityLog

10 million user actions including page views, pokes, and interactions

Key features

MapReduce analytics: Run distributed analytics jobs across large datasets using Hadoop’s parallel processing capabilities

Optimized implementations: Compare simple vs. optimized MapReduce jobs with combiner support for better performance

Dockerized environment: Complete Hadoop cluster setup with HDFS, web UIs, and monitoring tools

Scalable design: Process millions of records efficiently with proper data partitioning and aggregation

What you can analyze

CircleNet Analytics supports eight different analytics tasks:

Task A: Report the frequency of each favorite hobby on CircleNet

Task B: Find the 10 most popular CircleNetPages based on activity

Task C: Find all users whose hobby matches a specific interest

Task D: Compute the popularity factor (follower count) for each page owner

Task E: Determine user favorites by analyzing access patterns

Task F: Report owners more popular than the average user

Task G: Identify outdated pages with no activity in 90 days

Task H: Find users who follow someone in their region but aren’t followed back

Architecture

The platform runs on a containerized Hadoop cluster with the following components:

Hadoop HDFS: Distributed file system for storing datasets

MapReduce Engine: Parallel processing framework for analytics

NameNode Web UI: Monitor HDFS at http://localhost:3002

Job Tracker: Track MapReduce job progress and performance

All analytics tasks include both simple and optimized implementations, allowing you to compare performance and understand MapReduce optimization techniques.

Next steps

Set up your environment

Follow the setup guide to configure Docker, Hadoop, and HDFS

Load your data

Learn how to upload the CircleNet datasets to HDFS

Run your first job

Complete the quickstart to analyze hobby frequencies

Explore advanced tasks

Dive into complex analytics with joins and multi-stage MapReduce

Get Started

Dataset

Analytics Tasks

Guides

Overview

CircleNetPage

Follows

ActivityLog

Key features

Get started

Quickstart

Setup guide

Dataset overview

Analytics tasks

What you can analyze

Architecture

Next steps

Build docs developers (and LLMs) love

Get Started

Dataset

Analytics Tasks

Guides

​Overview

CircleNetPage

Follows

ActivityLog

​Key features

​Get started

Quickstart

Setup guide

Dataset overview

Analytics tasks

​What you can analyze

​Architecture

​Next steps

Build docs developers (and LLMs) love

Overview

Key features

Get started

What you can analyze

Architecture

Next steps