Theory idea for ai

Brainy · Aug 10, 2025

robot internally draws a virtual 3D map of everything it sees in real time, secondary ai uses reasoning and math to predict distances when drawing 3D map(agentic)

now that is has a map of what it sees, the secondary ai determines where is the most likely place the prompted task would be and steps

for example, "go to the kitchen and retrieve a spoon"

computer vision has this 3D map of the house

ai recognizes what a kitchen looks like from training data

stands in the kitchen

knows a spoon is likely in a drawer, and would be a single drawer at the topmost level, because reasoning

it opens several drawers and updates the 3D map in real time, simultaneously recognizing the contents in each drawer and internally deploying agents to analyze and debate on the objects seen

it uses that data and updates a workflow that it holds to memory

maybe this could bypass having to train an entirely new model

zd60 · Aug 10, 2025

I think this is what google wants to do with Genie 3 sort of. The issue is while these task seem simple they require quite a bit of long horizon task ability and awareness that the computers lack

Brainy · Aug 10, 2025

zd60 said:
I think this is what google wants to do with Genie 3 sort of. The issue is while these task seem simple they require quite a bit of long horizon task ability and awareness that the computers lack

let's see if gpt-5 has a solution

Goal

Give the robot just enough long-horizon ability using simple, reliable pieces (no giant new model).

Architecture (minimal)

1. Task Graph (HTN-lite): Decompose goal → ordered/substitutable steps.

2. Skills Registry: Reusable atomic skills (navigate, open, detect, grasp).

3. Blackboard Memory: Shared state (goal, map, beliefs, checklist, timers).

4. Event FSM: Simple state machine with triggers (success, timeout, low-conf, collision).

5. Belief Tracker: P(object | place/container). Updates after each observation.

6. Replanner: Only fires on triggers; otherwise stick to plan (prevents thrash).

Representations (tiny but powerful)

Checklist: [reach kitchen] → [scan counters] → [open top drawers left→right] → [inspect utensil caddy] → [expand radius].

Belief table: {top_drawer: 0.55, utensil_caddy: 0.25, dish_rack: 0.1, others: 0.1}.

Stop rules: success OR belief_max < θ → widen search.

Failure codes: {NO_GRASP, NO_OPEN, NO_VISUAL, PATH_BLOCKED} → mapped to specific fixes.

Control loop (pseudo)

1. Read next checklist item.

2. Run skill; log outcome + confidence.

3. Update beliefs + blackboard.

4. If trigger → call replanner; else continue.

5. On success → compress trace to memory (home-specific prior update).

Triggers (keep it small)

Timeout per step (e.g., 20–40s).

Low confidence (<0.6 detection or belief).

Force anomaly while opening.

Localization drift (>0.3 m) → quick relocalize.

Reasoning shortcuts (cheap wins)

Affordance priors: “Spoon ↦ top drawer near sink/stove” (few dozen rules cover most objects).

Container ordering: top → middle → bottom; left→right.

Active perception: change pose if detection <0.6, then re-score.

Exploration bandit: pick next container by UCB on belief (fast, no full POMDP).

Memory bias: exponential moving average per home (fast personalization).

Example: “Get a spoon” (FSM)

1. GoTo(Kitchen) → Verify(Scene=Kitchen).

2. OpenSet = TopDrawers(kitchen, left→right).

3. For d in OpenSet: Open(d) → Detect(spoon) → if found: Grasp+Deliver; else Belief↓ and continue.

4. If none: Check(UtensilCaddy) → Check(DishRack) → Expand(SearchRadius=1.5m).

5. On any failure code → specific recovery (regrasp, re-open with higher force limit, change camera angle).

Guardrails

Budget caps: max drawers=8, max replans=3.

Safety envelope: collision watchdog + dynamic obstacle veto.

Explainability: keep a step log (“why here?” → “highest belief: 0.55 top drawer”).

Training/No-training mix

Use off-the-shelf vision (open-vocab detection/segmentation).

Tiny adapters only where needed (handle detection, grasp scorer).

Learn priors from experience (no end-to-end retraining).

Metrics (to know it works)

SR@N: success within N container opens.

Time-to-first-sighting.

# replans / task.

Collision & force limit trips.

Implementation order (fast path)

1. FSM + checklist + skills registry.

2. Belief table + UCB container selection.

3. Four triggers + tiny replanner.

4. Memory of successful locations per home.

5. Add more priors only if metrics stall.

zd60 · Aug 10, 2025

Brainy said:
let's see if gpt-5 has a solution

Goal

Give the robot just enough long-horizon ability using simple, reliable pieces (no giant new model).

Architecture (minimal)

1. Task Graph (HTN-lite): Decompose goal → ordered/substitutable steps.

2. Skills Registry: Reusable atomic skills (navigate, open, detect, grasp).

3. Blackboard Memory: Shared state (goal, map, beliefs, checklist, timers).

4. Event FSM: Simple state machine with triggers (success, timeout, low-conf, collision).

5. Belief Tracker: P(object | place/container). Updates after each observation.

6. Replanner: Only fires on triggers; otherwise stick to plan (prevents thrash).

Representations (tiny but powerful)

Checklist: [reach kitchen] → [scan counters] → [open top drawers left→right] → [inspect utensil caddy] → [expand radius].

Belief table: {top_drawer: 0.55, utensil_caddy: 0.25, dish_rack: 0.1, others: 0.1}.

Stop rules: success OR belief_max < θ → widen search.

Failure codes: {NO_GRASP, NO_OPEN, NO_VISUAL, PATH_BLOCKED} → mapped to specific fixes.

Control loop (pseudo)

1. Read next checklist item.

2. Run skill; log outcome + confidence.

3. Update beliefs + blackboard.

4. If trigger → call replanner; else continue.

5. On success → compress trace to memory (home-specific prior update).

Triggers (keep it small)

Timeout per step (e.g., 20–40s).

Low confidence (<0.6 detection or belief).

Force anomaly while opening.

Localization drift (>0.3 m) → quick relocalize.

Reasoning shortcuts (cheap wins)

Affordance priors: “Spoon ↦ top drawer near sink/stove” (few dozen rules cover most objects).

Container ordering: top → middle → bottom; left→right.

Active perception: change pose if detection <0.6, then re-score.

Exploration bandit: pick next container by UCB on belief (fast, no full POMDP).

Memory bias: exponential moving average per home (fast personalization).

Example: “Get a spoon” (FSM)

1. GoTo(Kitchen) → Verify(Scene=Kitchen).

2. OpenSet = TopDrawers(kitchen, left→right).

3. For d in OpenSet: Open(d) → Detect(spoon) → if found: Grasp+Deliver; else Belief↓ and continue.

4. If none: Check(UtensilCaddy) → Check(DishRack) → Expand(SearchRadius=1.5m).

5. On any failure code → specific recovery (regrasp, re-open with higher force limit, change camera angle).

Guardrails

Budget caps: max drawers=8, max replans=3.

Safety envelope: collision watchdog + dynamic obstacle veto.

Explainability: keep a step log (“why here?” → “highest belief: 0.55 top drawer”).

Training/No-training mix

Use off-the-shelf vision (open-vocab detection/segmentation).

Tiny adapters only where needed (handle detection, grasp scorer).

Learn priors from experience (no end-to-end retraining).

Metrics (to know it works)

SR@N: success within N container opens.

Time-to-first-sighting.

# replans / task.

Collision & force limit trips.

Implementation order (fast path)

1. FSM + checklist + skills registry.

2. Belief table + UCB container selection.

3. Four triggers + tiny replanner.

4. Memory of successful locations per home.

5. Add more priors only if metrics stall.

interesting what prompt did you feed it? did it use the thinking mode? chatgpt 5 is hit or miss with its router functionality

Brainy · Aug 10, 2025

zd60 said:
interesting what prompt did you feed it? did it use the thinking mode? chatgpt 5 is hit or miss with its router functionality

gpt-5 thinking

prompt: "The issue is while these task seem simple they require quite a bit of long horizon task ability and awareness that the computers lack"

Brainy · Aug 10, 2025

@zd60

prompt: "my colleague said "chatgpt 5 is hit or miss with its router functionality" to this reply, what did he mean by that"

Meaning

“Hit or miss” = inconsistent.

“Router functionality” = the part that decides which tool/skill/agent to use next.

Two common “router” uses

1. Tool/function routing: pick between calling code, search, database, or a robot skill (navigate, open, grasp).

2. Expert/model routing: pick which sub-model/agent handles a subtask (mixture-of-experts).

Why it feels inconsistent

Tools overlap; triggers aren’t crystal-clear.

Prompts vary; small wording changes flip choices.

Long context → the model forgets constraints.

It may hallucinate instead of calling a tool or call the wrong one.

Temperature/stochasticity adds randomness.

Symptoms you’d see

Calls no tool when it should (e.g., “search” not invoked).

Calls the wrong skill (scans counters before checking drawers).

Loops/retries needlessly; ignores schemas/args.

In your robot example

Router must choose: Navigate → OpenDrawer → Detect → Grasp.

“Hit or miss” = it sometimes picks a poorer order, skips verification, or grasps without a solid detection.

Mitigations (practical)

1. Tight tool specs with mutually exclusive triggers (“If drawer closed → OpenDrawer, else Detect”).

2. Small rule/FSM front-end as the primary router; use the LLM only for ambiguous cases.

3. Few-shot examples showing correct tool choice and argument shapes.

4. Limit the tool set per task; remove near-duplicates.

5. Lower temperature; enforce max tool calls; add timeouts.

6. Validate choices (ask model to state: “I’m calling OpenDrawer because …”; auto-check the reason).

7. Backoff plan: if a call fails, retry with structured error; otherwise fall back to a safe default order.

8. Evals: create router tests (same goal, varied wording) and track accuracy over time.

Bottom line

Treat the LLM route
r as advisory. Keep a deterministic backbone for safety-critical steps.

zd60 · Aug 10, 2025

Brainy said:
gpt-5 thinking

prompt: "The issue is while these task seem simple they require quite a bit of long horizon task ability and awareness that the computers lack"

facsinating, mine just spewed out a list of things that are possible

zd60 · Aug 10, 2025

zd60 said:
facsinating, mine just spewed out a list of things that are possible

1) Hierarchical planning (planner → executor).
Decompose the goal into a DAG of sub-tasks with clear exit criteria. Let the model update a living “plan of record” instead of winging it.

2) External, persistent memory.
Keep facts, decisions, and open loops in a project workspace (notes/KB + vector search). Summaries + snapshots prevent context drift and allow resume after interruptions.

3) Tool use and stateful actions.
Give the AI functions/APIs (calendar, files, email, spreadsheets, web, databases) so it can act and read/write state, not just chat.

4) Progress checks and invariants.
Attach checklists, tests, and assertions to each sub-task (e.g., “budget column sums to total,” “dates don’t conflict”). Use an automatic “critic” step to verify outputs before moving on.

5) World/model of the task.
Encode constraints and relationships (deadlines, dependencies, capacities) as a lightweight schema/knowledge graph so the system can simulate consequences and detect conflicts early.

Brainy · Aug 10, 2025

zd60 said:
1) Hierarchical planning (planner → executor).
Decompose the goal into a DAG of sub-tasks with clear exit criteria. Let the model update a living “plan of record” instead of winging it.

2) External, persistent memory.
Keep facts, decisions, and open loops in a project workspace (notes/KB + vector search). Summaries + snapshots prevent context drift and allow resume after interruptions.

3) Tool use and stateful actions.
Give the AI functions/APIs (calendar, files, email, spreadsheets, web, databases) so it can act and read/write state, not just chat.

4) Progress checks and invariants.
Attach checklists, tests, and assertions to each sub-task (e.g., “budget column sums to total,” “dates don’t conflict”). Use an automatic “critic” step to verify outputs before moving on.

5) World/model of the task.
Encode constraints and relationships (deadlines, dependencies, capacities) as a lightweight schema/knowledge graph so the system can simulate consequences and detect conflicts early.

i've set settings so it is better adhd friendly and does bullet points

zd60 · Aug 10, 2025

Brainy said:
i've set settings so it is better adhd friendly and does bullet points

Oh I barley use mine I just have the subscription just this months to try it, I stick to local models or other platforms

Brainy · Aug 10, 2025

zd60 said:
Oh I barley use mine I just have the subscription just this months to try it, I stick to local models or other platforms

i gotcha, it's been so long since i've played around with local models

i should see what thebloke has uploaded recently

Brainy · Aug 10, 2025

zd60 said:
1) Hierarchical planning (planner → executor).
Decompose the goal into a DAG of sub-tasks with clear exit criteria. Let the model update a living “plan of record” instead of winging it.

2) External, persistent memory.
Keep facts, decisions, and open loops in a project workspace (notes/KB + vector search). Summaries + snapshots prevent context drift and allow resume after interruptions.

3) Tool use and stateful actions.
Give the AI functions/APIs (calendar, files, email, spreadsheets, web, databases) so it can act and read/write state, not just chat.

4) Progress checks and invariants.
Attach checklists, tests, and assertions to each sub-task (e.g., “budget column sums to total,” “dates don’t conflict”). Use an automatic “critic” step to verify outputs before moving on.

5) World/model of the task.
Encode constraints and relationships (deadlines, dependencies, capacities) as a lightweight schema/knowledge graph so the system can simulate consequences and detect conflicts early.

i prompted with that entire response:

Quick take
- Strong foundations. Add a few pieces to make it reliable at long horizons.
Additions (fill the gaps)
6) Belief/uncertainty tracking → keep P(state) and update after each action.
7) Failure & recovery policy → retry/backoff trees, timeouts, fallbacks.
8) Safety & permissions → scopes, rate/cost budgets, human approvals.
9) Telemetry & evals → traces, regressions, router/tool accuracy tests.
10) Scheduler/priorities → queue, SLAs, preemption across tasks.
Implementation (per your 5)
1. Hierarchical planning
  - Represent plan as a DAG with node types: {goal, step, check, decision}.
  - Exit criteria = assertions; store status per node; replan only on triggers.
2. External, persistent memory
  - KV store (facts), vector store (notes/docs), timeline (decisions/open loops).
  - Nightly snapshots + short rolling summary for context injection.
3. Tool use & stateful actions
  - Strict function specs (arguments, idempotency, side-effect notes).
  - Transaction log: {tool, input, output, result, cost, latency}.
4. Progress checks & invariants
  - Pre/post hooks: validate(), critic() before advancing DAG edge.
  - Auto-tests per step; block merge on any failed assertion.
5. World/model of the task
  - Lightweight schema: entities, constraints, dependencies, capacities.
  - Fast solver: detect conflicts early; propose fixes (shift dates, split work).
Minimal data contracts (keep it simple)
- PlanOfRecord: {goal, nodes:[{id,type,owner,inputs,assertions,status}], edges:[(id→id)]}.
- Memory: {facts[], notes[], decisions[], open_loops[], snapshot}.
- ToolCall: {name, args, preconds, postconds, retries, result, error}.
- Constraint: {subject, relation, object, window, priority}.
Control loop (steady, not chatty)
1. Expand next frontier node(s) in DAG.
2. Execute tool/step.
3. Update memory + beliefs.
4. Run critic/assertions; if fail → recovery policy; else advance.
5. Emit trace + metrics; repeat.
Guardrails
- Budgets (time/cost/tool quotas), permissions, PII redaction, rollback on failed postconds.
Metrics to track
- Task SR, step SR, mean retries, critic catch rate, router accuracy, cost/time per task.
Fast stack (example)
- Orchestrator (FSM/HTN), DAG lib, KV + vector DB, function router, OR-Tools/CP-SAT for constraints, tracing/evals harness.

zd60 · Aug 10, 2025

Brainy said:
i gotcha, it's been so long since i've played around with local models

i should see what thebloke has uploaded recently

Local models have come quite a long way, the chinese models do the best so far. I've played around a bit with gpt-oss but it seems to be heavily trained on synthetic data like the phi models and kinda loses coherence fast outside of tool calls and instruction heavy task

zd60 · Aug 10, 2025

Brainy said:
i prompted with that entire response:

Quick take

Strong foundations. Add a few pieces to make it reliable at long horizons.

Additions (fill the gaps)
6) Belief/uncertainty tracking → keep P(state) and update after each action.
7) Failure & recovery policy → retry/backoff trees, timeouts, fallbacks.
8) Safety & permissions → scopes, rate/cost budgets, human approvals.
9) Telemetry & evals → traces, regressions, router/tool accuracy tests.
10) Scheduler/priorities → queue, SLAs, preemption across tasks.

Implementation (per your 5)

Hierarchical planning

Represent plan as a DAG with node types: {goal, step, check, decision}.

Exit criteria = assertions; store status per node; replan only on triggers.

External, persistent memory

KV store (facts), vector store (notes/docs), timeline (decisions/open loops).

Nightly snapshots + short rolling summary for context injection.

Tool use & stateful actions

Strict function specs (arguments, idempotency, side-effect notes).

Transaction log: {tool, input, output, result, cost, latency}.

Progress checks & invariants

Pre/post hooks: validate(), critic() before advancing DAG edge.

Auto-tests per step; block merge on any failed assertion.

World/model of the task

Lightweight schema: entities, constraints, dependencies, capacities.

Fast solver: detect conflicts early; propose fixes (shift dates, split work).

Minimal data contracts (keep it simple)

PlanOfRecord: {goal, nodes:[{id,type,owner,inputs,assertions,status}], edges:[(id→id)]}.

Memory: {facts[], notes[], decisions[], open_loops[], snapshot}.

ToolCall: {name, args, preconds, postconds, retries, result, error}.

Constraint: {subject, relation, object, window, priority}.

Control loop (steady, not chatty)

Expand next frontier node(s) in DAG.

Execute tool/step.

Update memory + beliefs.

Run critic/assertions; if fail → recovery policy; else advance.

Emit trace + metrics; repeat.

Guardrails

Budgets (time/cost/tool quotas), permissions, PII redaction, rollback on failed postconds.

Metrics to track

Task SR, step SR, mean retries, critic catch rate, router accuracy, cost/time per task.

Fast stack (example)

Orchestrator (FSM/HTN), DAG lib, KV + vector DB, function router, OR-Tools/CP-SAT for constraints, tracing/evals harness.

Pretty interesting. I do think that these models are just the stepping stones to a future full of real AGI/ASI and that LLM's won't be going anywere but we will need some sort of new breakthrough in technology to get past the hurtles I think we will hurt in the next s-curve

Brainy · Aug 10, 2025

zd60 said:
Local models have come quite a long way, the chinese models do the best so far. I've played around a bit with gpt-oss but it seems to be heavily trained on synthetic data like the phi models and kinda loses coherence fast outside of tool calls and instruction heavy task

i will have to check them out

zd60 · Aug 10, 2025

Brainy said:
i will have to check them out

Unless you have a really good computer you'll probably be restricted to the gpt-oss-20b model because the 120 model might be a little tough to run but it can still be good for some task just more akin to the phi models seems heavily synthetically trained

Brainy · Aug 10, 2025

zd60 said:
Unless you have a really good computer you'll probably be restricted to the gpt-oss-20b model because the 120 model might be a little tough to run but it can still be good for some task just more akin to the phi models seems heavily synthetically trained

i have a 3090 but that's it

zd60 said:
Pretty interesting. I do think that these models are just the stepping stones to a future full of real AGI/ASI and that LLM's won't be going anywere but we will need some sort of new breakthrough in technology to get past the hurtles I think we will hurt in the next s-curve

we'll see where things go, especially with google and china

zd60 · Aug 10, 2025

Brainy said:
i have a 3090 but that's it

we'll see where things go, especially with google and china

I run a 3090 too.

google has more than they are letting on. Openai still has a small lead though overall across most benchmarks and use cases.

Brainy · Aug 10, 2025

zd60 said:
I run a 3090 too.

google has more than they are letting on. Openai still has a small lead though overall across most benchmarks and use cases.

definitely google has something under their sleeve, and i think zuck knows it if he's frantically paying that much money for skill

Welcome to Incels.is - Involuntary Celibate Forum

Welcome! This is a forum for involuntary celibates: people who lack a significant other. Are you lonely and wish you had someone in your life? You're not alone! Join our forum and talk to people just like you.

Follow Us On Social Media

Theory idea for ai

Brainy

Sewer cyborg

zd60

Paragon

Brainy

Sewer cyborg

zd60

Paragon

Brainy

Sewer cyborg

Brainy

Sewer cyborg

zd60

Paragon

zd60

Paragon

Brainy

Sewer cyborg

zd60

Paragon

Brainy

Sewer cyborg

Brainy

Sewer cyborg

zd60

Paragon

zd60

Paragon

Brainy

Sewer cyborg

zd60

Paragon

Brainy

Sewer cyborg

zd60

Paragon

Brainy

Sewer cyborg

Similar threads

Users who are viewing this thread

Follow Us On Social Media

About Us

Online statistics