General questions
Can I use a different AI model?
Can I use a different AI model?
gemini-2.5-flash(default, recommended)gemini-2.5-pro- Other Gemini model variants
.env file:src/analyzer.py to implement their specific API interfaces.Will this delete my tweets automatically?
Will this delete my tweets automatically?
- Analyzes tweets against your criteria
- Generates a CSV file with URLs of flagged tweets
- Provides recommendations
- Never connects to Twitter/X API
- Never deletes anything automatically
- Requires manual review and deletion
What if I disagree with a flagged tweet?
What if I disagree with a flagged tweet?
- Simply don’t delete it
- Skip to the next flagged tweet
- You can mark it as reviewed in the CSV by changing the
deletedcolumn totrue(to track your progress)
Can I analyze tweets without planning to delete them?
Can I analyze tweets without planning to delete them?
- Content audit for professional branding
- Analyzing your posting patterns over time
- Identifying topics you’ve tweeted about
- Finding tweets that need editing or clarification
- Preparing for job applications or media appearances
results.csv file serves as an audit report you can review without taking any action.How accurate is the AI analysis?
How accurate is the AI analysis?
config.json.Factors affecting accuracy:- Clarity of your
topics_to_exclude - Specificity of
forbidden_words - Detail in
additional_instructions - The AI model’s interpretation of context
- Start with a small sample (5-10 tweets)
- Review the results
- Refine your criteria
- Test again on a larger sample
- Iterate until you’re satisfied
Usage and workflow
How long does the analysis take?
How long does the analysis take?
- Number of tweets
- API rate limiting
- Your
RATE_LIMIT_SECONDSsetting
- 100 tweets: ~2 minutes
- 1,000 tweets: ~17 minutes
- 10,000 tweets: ~3 hours
- 50,000 tweets: ~14 hours
- Stop and resume at any time (Ctrl+C)
- Spread analysis over multiple days
- Run in the background
What happens if the process is interrupted?
What happens if the process is interrupted?
- Ctrl+C
- System crash
- Network issues
- API quota exhaustion
- Load the last checkpoint from
data/checkpoint.txt - Resume from exactly where it left off
- Append new results to
results.csv - Not re-process already analyzed tweets
Can I adjust the batch size?
Can I adjust the batch size?
src/config.py:- Faster overall processing
- Less frequent checkpoint saves
- More tweets to re-process if interrupted
- Higher memory usage
- Slower overall processing
- More frequent checkpoint saves
- Minimal loss if interrupted
- Lower memory usage
How do I restart the analysis from scratch?
How do I restart the analysis from scratch?
Are retweets analyzed?
Are retweets analyzed?
- Retweets represent content you shared, not created
- They start with “RT @username”
- Analyzing them would waste API quota
- You typically want to audit your original content
- Original tweets
- Replies
- Quote tweets (your added commentary)
- Threads
src/application.py:125-126Configuration and customization
What happens if I don't create config.json?
What happens if I don't create config.json?
config.json later to refine your criteria.How do forbidden_words work?
How do forbidden_words work?
forbidden_words performs exact word matching (case-insensitive).Example:- “Crypto is the future!” ✓ (contains “crypto”)
- “Damn, that’s interesting” ✓ (contains “damn”)
- “wtf is happening” ✓ (contains “wtf”)
- “Cryptocurrency adoption” ✗ (different word)
- “Cryptography basics” ✗ (different word)
- “Damnation” ✗ (different word)
What's the difference between topics_to_exclude and tone_requirements?
What's the difference between topics_to_exclude and tone_requirements?
- Content categories you want to avoid
- Subject matter restrictions
- Thematic filters
- Stylistic and language rules
- Communication style preferences
- Manner of expression
topics_to_exclude as “what you talk about” and tone_requirements as “how you say it.”How can I test my criteria before running on all tweets?
How can I test my criteria before running on all tweets?
Can I adjust the rate limiting?
Can I adjust the rate limiting?
.env file:- You’re hitting rate limits frequently
- You want to be conservative with API usage
- You’re running analysis overnight (no rush)
- You have API quota to spare
- You want faster processing
- You’re on a paid API tier with higher limits
Cost and API limits
How much does it cost to analyze my tweets?
How much does it cost to analyze my tweets?
- 15 requests per minute
- 1,500 requests per day
- Free input/output tokens for moderate use
- 1,000 tweets: Free (within daily limit)
- 5,000 tweets: Free (spread over 4 days)
- 10,000 tweets: Free (spread over 7 days)
- 50,000+ tweets: May need paid API tier
What happens if I exceed API quota?
What happens if I exceed API quota?
- Detects rate limit or quota errors (429, quota exceeded)
- Retries up to 3 times with exponential backoff
- Saves checkpoint before stopping
- Shows error message
- Wait until your quota resets (usually next day)
- Run the analyze command again
- Processing resumes from the checkpoint
- Increase
RATE_LIMIT_SECONDSto slow down requests - Spread analysis over multiple days
- Consider upgrading to a paid API tier for large volumes
Can I analyze 100,000+ tweets?
Can I analyze 100,000+ tweets?
- Takes multiple days with free tier limits
- Requires patience (spread over 67+ days at 1,500/day)
- Higher chance of interruptions
- Use a paid API tier for faster processing
- Filter before analyzing: Modify the archive to include only recent tweets (e.g., last 3 years)
- Sample analysis: Analyze every 10th tweet for a quick overview
- Run continuously: Let it run in the background over weeks
Security and privacy
Is my data secure?
Is my data secure?
- All output files use
0o600(owner-only read/write) - Directories use
0o750(owner read/write/execute, group read/execute) - Prevents unauthorized access on shared systems
- API keys loaded from
.env(never committed to git) data/directory is gitignored (won’t be committed)- No cloud storage or external logging
- All processing happens locally
- Tweet text and ID only
- Your criteria from
config.json - No personal information beyond tweet content
src/storage.py:8-9Should I commit config.json to git?
Should I commit config.json to git?
- Generic professional criteria
- Standard content filtering rules
- Public repository
- Criteria reveal personal concerns
- Contains sensitive keywords or topics
- Private reasons for cleanup
config.json to .gitignore if your criteria are personal or sensitive.What data is sent to Google Gemini?
What data is sent to Google Gemini?
- Tweet ID (e.g., “1234567890”)
- Tweet text content
- Your criteria from
config.json - Structured prompt with analysis instructions
- Your API key (used for authentication only)
- Your X username
- Tweet metadata (likes, retweets, etc.)
- Other tweets in your archive
- Any local file paths
- Decision: “DELETE” or “KEEP”
- Reason: Brief explanation
src/analyzer.py:107-136Development and contributing
How do I run the tests?
How do I run the tests?
How can I contribute to the project?
How can I contribute to the project?
Fork the repository
What's the project structure?
What's the project structure?
main.py: CLI commands using Clickapplication.py: Coordinates extraction and analysisanalyzer.py: Gemini API client with retry logicstorage.py: Parsers and writers for JSON/CSVconfig.py: Settings and criteria management
Still have questions?
If your question isn’t answered here:- Check the README for detailed documentation
- Search existing issues on GitHub
- Open a new issue with your question