Extraer JSON desde terminal

Tú

una página web hace una llamada api, obtiene un json, y lo muestra en pantalla. usando la terminal, cómo puedo extraer el json?

18/3/2026, 5:56:45 p.m.

Asistente

🤔 Pensamiento: We need to answer the user's question: "una página web hace una llamada api, obtiene un json, y lo muestra en pantalla. usando la terminal, cómo puedo extraer el json?" The user wants to know how to extract the JSON that a web page fetches via an API call and displays on screen, using the terminal. The user is likely asking: given a web page that makes an API call and shows the JSON result, how can I, from the terminal, capture that JSON? The user might be thinking of intercepting the network traffic or inspecting the page source, but specifically wants to do it from the terminal. So we need to provide terminal-based methods: using curl, wget, browser developer tools from command line (like using headless browsers), or using tools like `jq` to parse. But the core is: how to get the JSON that is loaded by the page. We should clarify: if the page loads JSON via an API call, we can directly request that API endpoint if we know the URL. So we need to find the API URL. That can be done by inspecting the network tab in browser dev tools, but the user wants terminal methods. Possibly using tools like `curl` to fetch the page, parse the HTML to find script tags or API calls, but that's complicated. Alternatively, they might be referring to a page that displays JSON as text (like a REST API endpoint) and they want to extract it from the terminal. The phrasing "una página web hace una llamada api, obtiene un json, y lo muestra en pantalla" suggests the page itself is showing the JSON (maybe as plain text or formatted). So if the page displays the JSON, we can simply download the page source and extract the JSON from it. But often, the JSON is loaded dynamically via JavaScript and inserted into the DOM, so the raw HTML source might not contain it. In that case, we need to either simulate the API call or use a headless browser to render the page and capture the data. Given the user wants terminal solutions, we can suggest: 1. If the API endpoint is known (e.g., from inspecting network in browser), use `curl` or `wget` to fetch it directly. 2. If not, use browser dev tools to find the API request URL, then use curl. 3. Use command-line tools like `curl` combined with `grep`, `sed`, `jq` to extract JSON from HTML if it's embedded in script tags or pre tags. 4. Use headless browser tools like Puppeteer with a script, or `phantomjs`, or `casperjs` to get the rendered content and extract JSON. But that might be overkill. 5. Use browser's remote debugging protocol (e.g., Chrome's CDP) via command line to capture network requests. 6. Use tools like `mitmproxy` or `tcpdump` to intercept network traffic, but that's complex. We need to provide a practical answer: the simplest is to use browser's dev tools (F12) to find the API URL, then use `curl`. But the user specifically asks "usando la terminal", so we should focus on terminal commands. However, we can mention that the first step might be to use the browser to identify the endpoint, then use terminal to fetch it. That's a common workflow. Alternatively, if the page is simple and displays JSON directly (maybe in a <pre> tag), we can use `curl URL | jq .` to pretty print. Let's craft a response in Spanish, step by step, explaining different scenarios and methods. We should also consider that the user might be a beginner, so we should explain clearly. The answer should include: - Explanation that the JSON might be obtained by directly calling the API if the URL is known. - How to find the API URL using browser's network inspector (even though it's GUI, it's necessary). - Then use curl in terminal: `curl "API_URL" | jq .` to format. - If the JSON is embedded in the HTML (e.g., inside a script tag with a variable), we can use tools like `grep -oP` or `sed` to extract, but that's fragile. Better to use a tool like `pup` or `htmlq` to parse HTML. - If the page loads the JSON dynamically, we might need to use a headless browser. Provide an example with Puppeteer (Node.js) or a simple Python script with requests and BeautifulSoup, but the user wants terminal, so maybe suggest using `curl` to get the page and then use `jq` if it's directly accessible. - Also mention tools like `httpie` as an alternative to curl. We should also mention that if the API requires authentication or specific headers, those need to be included. We should end with a practical example. Let's structure the answer: 1. Introducción: Depende de cómo la página muestra el JSON. 2. Caso 1: La página muestra el JSON directamente en texto (por ejemplo, en una URL de API). Usar curl. 3. Caso 2: El JSON está incrustado en el HTML (en una etiqueta <script> o <pre>). Usar herramientas como grep, sed, o pup para extraerlo. 4. Caso 3: El JSON se carga dinámicamente mediante JavaScript. Entonces necesitas obtener la URL de la API desde las herramientas de desarrollo del navegador, y luego usar curl. O usar un navegador headless. 5. Recomendación general: Usar las herramientas de desarrollo del navegador para identificar la petición API y luego usar curl desde la terminal. 6. Ejemplos de comandos: curl, jq, grep, pup. 7. Conclusión. We'll write in Spanish.

Para extraer el JSON que una página web obtiene mediante una llamada a una API y luego muestra en pantalla, desde la terminal, el método más directo depende de cómo la página presenta ese JSON. Aquí te explico varios escenarios y las herramientas que puedes usar.

1. La página muestra el JSON directamente (por ejemplo, es una API pública)

Si la URL que visitas en el navegador devuelve el JSON en texto plano (como cuando entras a https://api.example.com/data), simplemente puedes usar curl o wget para obtenerlo y, si quieres formatearlo, usar jq:

curl "https://api.example.com/data" | jq .

Si no tienes jq instalado, puedes omitirlo y ver el JSON en bruto:

curl "https://api.example.com/data"

2. El JSON está incrustado en el HTML de la página

A veces la página incluye el JSON dentro de una etiqueta <script> (por ejemplo, para inicializar datos) o dentro de un <pre> para mostrarlo. En ese caso puedes descargar el HTML y extraerlo con herramientas de línea de comandos.

Con `grep` y `sed` (sencillo pero frágil)

Si el JSON está en una línea que puedes identificar, por ejemplo:

<script>var datos = {"clave": "valor"};</script>

Podrías extraerlo con:

curl -s "https://ejemplo.com/pagina" | grep -oP 'var datos = \K.*?;' | sed 's/;$//' | jq .

Esto es muy específico y puede romperse si el HTML cambia.

Con herramientas para parsear HTML (más robusto)

pup (una herramienta tipo CSS selector):

curl -s "https://ejemplo.com/pagina" | pup 'script text{}' | grep -o 'datos.*' ...

htmlq (similar a jq pero para HTML):

curl -s "https://ejemplo.com/pagina" | htmlq 'script' --text

Luego puedes filtrar el texto para quedarte solo con el JSON.

3. El JSON se carga dinámicamente mediante JavaScript

Si la página hace una petición AJAX/fetch y luego inserta el JSON en el DOM, el HTML inicial no contiene los datos. En este caso necesitas:

a) Encontrar la URL de la API usando el navegador (interfaz gráfica)

Abre las herramientas de desarrollo (F12), ve a la pestaña Red (Network), recarga la página y busca la petición que devuelve el JSON. Copias la URL (y si tiene cabeceras especiales como Authorization, también las necesitarás). Luego desde la terminal usas curl con esas cabeceras.

curl -H "Authorization: Bearer token" "https://api.example.com/endpoint" | jq .

b) Usar un navegador headless desde la terminal

Herramientas como Puppeteer (Node.js) o Playwright permiten automatizar un navegador y capturar las respuestas de red. Por ejemplo, con Puppeteer:

// script.js
const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://ejemplo.com');
  // Escuchar eventos de respuesta
  page.on('response', async response => {
    const url = response.url();
    if (url.includes('api/')) {  // Filtra por la URL que te interesa
      const json = await response.json();
      console.log(JSON.stringify(json, null, 2));
    }
  });
  // Esperar un poco para que se carguen las peticiones
  await page.waitForTimeout(5000);
  await browser.close();
})();

Lo ejecutas con:

node script.js

Esto imprimirá el JSON en la terminal.

4. Interceptar el tráfico de red (más avanzado)

Puedes usar herramientas como mitmproxy o tcpdump para capturar las peticiones HTTP mientras navegas, pero es más complejo y suele ser excesivo para este propósito.

Resumen práctico

La forma más común y sencilla es:

Identificar la URL de la API usando las herramientas de desarrollo del navegador (pestaña Red).
Reproducir la petición desde la terminal con curl, incluyendo las cabeceras necesarias.
Formatear la salida con jq para leerla cómodamente.

Por ejemplo:

curl -s 'https://ejemplo.com/api/datos?param=1' -H 'Accept: application/json' | jq .

Si el JSON es muy grande, puedes redirigirlo a un archivo:

curl ... > datos.json

Y luego visualizarlo con less o abrirlo con jq:

jq . datos.json

Espero que esta guía te ayude a extraer el JSON que necesitas desde la terminal.