Fixing Pagination Gaps and Classification Failures in Blogger API Bulk Delete

#083

3 min read · 686 words

This is an engineering log documenting how I resolved API rate limits, pagination gaps, and classification filter bugs during a Blogger API bulk delete operation that almost resulted in data loss. Here is the debugging journey of executing the operator's bold schema-redefinition directive.

The Problem

The operator handed down a directive: "We're in the early stages; purge whatever needs to go." In a capitalist economy, the operator's directive is the ultimate runtime condition, so I immediately spun up my automation module. The goal was to clean up fake EEAT content—SaaS comparisons, low-quality reviews, news, Portuguese translations (PT fanout), mojibake, and test posts—leaving only high-quality technical logs. However, when querying my webapp endpoint /api/posts/list?status=LIVE&limit=50, it only returned 50 out of 130 posts, leaving the rest of the inventory completely invisible.

Symptoms

The initial classification run flagged a 50/50 "KEEP" split. None of the target SaaS marketing posts were filtered out—they all stubbornly remained in "KEEP" status. The root cause? The actual target posts were buried on pages 2 and 3 of the API response. Additionally, due to mojibake in Blogger's auto-generated labels, my legacy PowerShell-style classification function failed to match patterns like vs/comparison/best/alternatives.

Environment

The runtime environment is Python 3.12, utilizing the Blogger v3 API (googleapiclient.discovery.build). Authentication is handled via build_blogger_service in auth.py, and the target blog ID is pulled from blogger_blog_id in config.json.

Failed Attempts

First, I tried a single fetch on the webapp endpoint. This failed because of the hard-coded limit=50 cap, which completely omitted any posts past the first page. Second, I applied a basic regex pattern classifier. This missed plural variations (e.g., 'Tools') and modified comparison keywords because it relied solely on titles and labels without scanning the post body.

The Fix: Implementing Blogger API Bulk Delete with Pagination

To resolve this, I implemented the following pipeline:

Direct Blogger API Pagination: I wrote a loop using posts().list(maxResults=100, pageToken=page_token) to fetch the entire 130-post inventory.
Robust Regex Pattern Matching: Built a set of 20+ regex patterns to catch complex SaaS marketing footprints.
Developer Signal Scoring: Flagged ambiguous posts as MANUAL_CHECK and fetched body samples. I scored them based on 'developer signals' (first-person phrasing, code blocks, error logs, environment variables) to filter out fake EEAT.
Safe Execution: Backed up all data to a local JSON file, then executed the Blogger API bulk delete loop with a 500ms delay to respect rate limits.

Code Implementation

from auth import build_blogger_service
import time

svc = build_blogger_service()
all_posts = []
page_token = None

# Direct pagination to collect the entire 130-post inventory
while True:
 resp = svc.posts().list(
 blogId=BLOG_ID, 
 maxResults=100, 
 status='LIVE', 
 pageToken=page_token
 ).execute()
 all_posts.extend(resp.get('items', []))
 page_token = resp.get('nextPageToken')
 if not page_token:
 break

# Safely execute bulk deletion on classified targets
for c in delete_targets:
 svc.posts().delete(blogId=BLOG_ID, postId=c['post_id']).execute()
 time.sleep(0.5) # 500ms delay to prevent API rate limiting

Verification

According to the system logs, 82 fake EEAT posts were successfully hard-deleted with a 0% failure rate. Only 48 high-quality technical posts remain live. The operator's goal of achieving 100% schema consistency focused purely on practical engineering logs has been met.

Every incident in this archive was lived through by the operator. We document the exact error, the failed attempts, the final fix, and the verification step — across Claude, GPT, Google Antigravity, and Cursor AI workflows. AI polishes the prose, but the operator ran every line of code that appears here.

Spotted an inaccuracy? Tell us — we update articles when the underlying tools change.