public DOMNode::C14N(
    bool $exclusive = false,
    bool $withComments = false,
    ?array $xpath = null,
    ?array $nsPrefixes = null
): string|false

public DOMNode::C14NFile(
    string $uri,
    bool $exclusive = false,
    bool $withComments = false,
    ?array $xpath = null,
    ?array $nsPrefixes = null
): int|false

public DOMNode::cloneNode(bool $deep = false): DOMNode|false

public DOMNode::contains(DOMNode|DOMNameSpaceNode|null $other): bool

public DOMNode::getLineNo(): int

public DOMNode::getNodePath(): ?string

public DOMNode::getRootNode(?array $options = null): DOMNode

public DOMNode::hasAttributes(): bool

public DOMNode::hasChildNodes(): bool

public DOMNode::insertBefore(DOMNode $node, ?DOMNode $child = null): DOMNode|false

public DOMNode::isDefaultNamespace(string $namespace): bool

public DOMNode::isEqualNode(?DOMNode $otherNode): bool

public DOMNode::isSameNode(DOMNode $otherNode): bool

public DOMNode::isSupported(string $feature, string $version): bool

public DOMNode::lookupNamespaceURI(?string $prefix): ?string

public DOMNode::lookupPrefix(string $namespace): ?string

public DOMNode::normalize(): void

public DOMNode::removeChild(DOMNode $child): DOMNode|false

public DOMNode::replaceChild(DOMNode $node, DOMNode $child): DOMNode|false

}

Propriétés

actualEncoding: Obsolète. L'encodage actuel du document, en lecture seule, équivalent à encoding.
childElementCount: Le nombre d'éléments enfants.
config: Obsolète. Configuration utilisée lorsque DOMDocument::normalizeDocument() est appelé.
doctype: Le Document Type Declaration associé avec ce document.
documentElement: L'objet DOMElement qui est le premier élément du document. S'il n'est pas trouvé, ceci est évalué à null.
documentURI: La localisation du document, ou null si indéfini.
encoding: L'encodage du document, tel que spécifié par la déclaration XML. Cet attribut n'est pas présent dans la spécification DOM Level 3 finale, mais représente la seule façon de manipuler l'encodage du document XML dans cette implémentation.
firstElementChild: Premier élément enfant ou null.
formatOutput: Formate élégamment le résultat avec une indentation et des espaces supplémentaires. Ce paramètre n'a aucun effet si le document a été chargé avec l'activation de preserveWhitespace.
implementation: L'objet DOMImplementation qui gère ce document.
lastElementChild: Dernier élément enfant ou null.
preserveWhiteSpace: Ne pas supprimer les espaces redondants. Vaut par défaut true. Définir ce paramètre à false a le même effet de définir à LIBXML_NOBLANKS le paramètre option de la méthode DOMDocument::load().
recover: Propriétaire. Active le mode "recovery", i.e. tente d'analyser un document mal formé. Cet attribut ne fait pas partie de la spécification DOM et est spécifique à libxml.
resolveExternals: Définissez-le à true pour charger des entités externes depuis la déclaration doctype. C'est utile pour inclure des entités dans vos documents XML.
standalone: Obsolète. Si le document est "standalone", ou non, tel que spécifié par la déclaration XML, correspondant à xmlStandalone.
strictErrorChecking: Lance une DOMException en cas d'erreur. Par défaut, vaut true.
substituteEntities: Propriétaire. Si l'on doit ou non substituer les entités. Cet attribut ne fait pas partie de la spécification DOM et est spécifique à libxml. Par défaut, false

Attention
Activer la substitution d'entités peut faciliter les attaques XML External Entity (XXE).
validateOnParse: Charge et valide la DTD. Par défaut, vaut false.

Attention
Activer la validation du DTD peut faciliter les attaques XML External Entity (XXE).
version: Obsolète. Version du XML, correspond à xmlVersion.
xmlEncoding: Un attribut spécifiant l'encodage du document. Il vaut null lorsque l'encodage n'est pas spécifié, ou lorsqu'il est inconnu, comme c'est le cas lorsque le document a été créé en mémoire.
xmlStandalone: Un attribut spécifiant si le document est "standalone". Il vaut false lorsque non spécifié. Un document standalone est un document où il n'y a pas de déclarations de balisage externes. Un exemple d'une telle déclaration de balisage est lorsque la DTD déclare un attribut avec une valeur par défaut.
xmlVersion: Un attribut spécifiant le numéro de version du document. S'il n'y a pas de déclaration et si le document supporte la fonctionnalité "XML", la valeur sera "1.0".

Historique

Version	Description
8.0.0	DOMDocument implémente désormais DOMParentNode.
8.0.0	La méthode non-implémenté DOMDocument::renameNode() a été retirée.

Notes

Note:
L'extension DOM utilise l'encodage UTF-8. Utilisez mb_convert_encoding(), UConverter::transcode(), ou iconv() pour manipuler d'autres encodages.

Note:
Lors de l'utilisation de json_encode() sur un objet DOMDocument le résultat sera celui d'encoder un objet vide.

Voir aussi

» Spécification W3C de Document

Sommaire

DOMDocument::adoptNode — Transfer a node from another document
DOMDocument::append — Appends nodes after the last child node
DOMDocument::__construct — Crée un nouvel objet DOMDocument
DOMDocument::createAttribute — Crée un nouvel attribut
DOMDocument::createAttributeNS — Crée un nouvel attribut avec un espace de noms associé
DOMDocument::createCDATASection — Crée un nouveau nœud cdata
DOMDocument::createComment — Crée un nouveau nœud de commentaire
DOMDocument::createDocumentFragment — Crée un nouveau fragment de document
DOMDocument::createElement — Crée un nouveau nœud
DOMDocument::createElementNS — Crée un nouveau nœud avec un espace de noms associé
DOMDocument::createEntityReference — Crée un nouveau nœud de référence d'entité
DOMDocument::createProcessingInstruction — Crée un nouveau nœud PI
DOMDocument::createTextNode — Crée un nouveau nœud de texte
DOMDocument::getElementById — Cherche un élément avec un certain identifiant
DOMDocument::getElementsByTagName — Cherche tous les éléments qui ont le nom de la balise locale donné
DOMDocument::getElementsByTagNameNS — Recherche tous les éléments avec un nom de balise donné dans un espace de noms spécifié
DOMDocument::importNode — Importe un nœud dans le document courant
DOMDocument::load — Charge du XML depuis un fichier
DOMDocument::loadHTML — Charge du code HTML à partir d'une chaîne de caractères
DOMDocument::loadHTMLFile — Charge du HTML à partir d'un fichier
DOMDocument::loadXML — Charge du XML depuis une chaîne de caractères
DOMDocument::normalizeDocument — Normalise le document
DOMDocument::prepend — Prepends nodes before the first child node
DOMDocument::registerNodeClass — Enregistre la classe étendue utilisée pour créer un type de base de nœud
DOMDocument::relaxNGValidate — Effectue une validation relaxNG sur le document
DOMDocument::relaxNGValidateSource — Effectue une validation relaxNG sur le document
DOMDocument::replaceChildren — Replace children in document
DOMDocument::save — Sauvegarde l'arbre interne XML dans un fichier
DOMDocument::saveHTML — Sauvegarde le document interne dans une chaîne en utilisant un formatage HTML
DOMDocument::saveHTMLFile — Sauvegarde un document interne dans un fichier en utilisant un formatage HTML
DOMDocument::saveXML — Sauvegarde l'arbre interne XML dans une chaîne de caractères
DOMDocument::schemaValidate — Valide un document selon un schéma. Seuls les schémas XML 1.0 sont supportés.
DOMDocument::schemaValidateSource — Valide un document selon un schéma
DOMDocument::validate — Valide un document en se basant sur sa DTD
DOMDocument::xinclude — Remplace les XIncludes dans un objet DOMDocument

Improve This Page

Learn how improve this page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 19 notes

down

109

Fernando H ¶

16 years ago

Showing a quick example of how to use this class, just so that new users can get a quick start without having to figure it all out by themself. ( At the day of posting, this documentation just got added and is lacking examples. )

<?php

// Set the content type to be XML, so that the browser will   recognise it as XML.
header( "content-type: application/xml; charset=ISO-8859-15" );

// "Create" the document.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );

// Create some elements.
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track", "The ninth symphony" );

// Set the attributes.
$xml_track->setAttribute( "length", "0:01:15" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );

// Create another element, just to show you can add any (realistic to computer) number of sublevels.
$xml_note = $xml->createElement( "Note", "The last symphony composed by Ludwig van Beethoven." );

// Append the whole bunch.
$xml_track->appendChild( $xml_note );
$xml_album->appendChild( $xml_track );

// Repeat the above with some different values..
$xml_track = $xml->createElement( "Track", "Highway Blues" );

$xml_track->setAttribute( "length", "0:01:33" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
$xml_album->appendChild( $xml_track );

$xml->appendChild( $xml_album );

// Parse the XML.
print $xml->saveXML();

?>

Output:
<Album>
  <Track length="0:01:15" bitrate="64kb/s" channels="2">
    The ninth symphony
    <Note>
      The last symphony composed by Ludwig van Beethoven.
    </Note>
  </Track>
  <Track length="0:01:33" bitrate="64kb/s" channels="2">Highway Blues</Track>
</Album>

If you want your PHP->DOM code to run under the .xml extension, you should set your webserver up to run the .xml extension with PHP ( Refer to the installation/configuration configuration for PHP on how to do this ).

Note that this:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = $xml->createElement( "Album" );
$xml_track = $xml->createElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

is NOT the same as this:
<?php
// Will NOT work.
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml_track = new DOMElement( "Track" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
?>

although this will work:
<?php
$xml = new DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = new DOMElement( "Album" );
$xml->appendChild( $xml_album );
?>

down

andreas at userbrain dot com ¶

2 years ago

After struggling with parsing and modifying partial HTML content for several hours, I came to this solution which does work for me and is relatively simple compared to what else I found online.

This solution fixes unwanted DOCTYPE and html, body tags as well as encoding issues.

<?php

// Assumption: content is utf-8 encoded
$content = "<h1>This is a heading</h1><p>This is a paragraph</p>";

// Load content to a div and specify encoding with a meta tag
$temp_dom = new DOMDocument();
$temp_dom->loadHTML("<meta http-equiv='Content-Type' content='charset=utf-8' /><div>$content</div>");

// As loadHTML() adds a DOCTYPE as well as <html> and <body> tag, let’s create another DOMDocument and import just the nodes we want
$dom = new DOMDocument();
$first_div = $temp_dom->getElementsByTagName('div')[0];
$first_div_node = $dom->importNode($first_div, true);
$dom->appendChild($first_div_node);

// Do whatever you want to do
$dom->getElementsByTagName('h1')[0]->setAttribute('class', 'happy');

// You could also just echo $dom->saveHtml() if you don’t mind the div and whitespace 
echo substr(trim($dom->saveHtml()), 5, -6);

// Outputs: <h1 class="happy">This is a heading</h1><p>This is a paragraph</p>
?>

down

developer at nabtron dot com ¶

8 years ago

For those landing here and checking for encoding issue with utf-8 characteres, it's pretty easy to correct it, without adding any additional output tag to your html.

We'll be utilizing: mb_convert_encoding

Thanks to the user who shared: SmartDOMDocument in previous comments, I got the idea of solving it. However I truly wish that he shared the method instead of giving a link.

Anyway coming back to the solution, you can simply use:

<?php

            // checks if the content we're receiving isn't empty, to avoid the warning
            if ( empty( $content ) ) {
                return false;
            }

            // converts all special characters to utf-8
            $content = mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8');

            // creating new document
            $doc = new DOMDocument('1.0', 'utf-8');

            //turning off some errors
            libxml_use_internal_errors(true);

            // it loads the content without adding enclosing html/body tags and also the doctype declaration
            $doc->LoadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

            // do whatever you want to do with this code now

?>

I hope it solves the issue for someone! If you need my help or service to fix your code, you can reach me on nabtron.com or contact me at the email mentioned with this comment.

down

jay at jaygilford dot com ¶

14 years ago

Here's a small function I wrote to get all page links using the DOMDocument which will hopefully be of use to others



<?php

/**

 * @author Jay Gilford

 */

 

/**

 * get_links()

 * 

 * @param string $url

 * @return array

 */

function get_links($url) {

 

    // Create a new DOM Document to hold our webpage structure

    $xml = new DOMDocument();

 

    // Load the url's contents into the DOM

    $xml->loadHTMLFile($url);

 

    // Empty array to hold all links to return

    $links = array();

 

    //Loop through each <a> tag in the dom and add it to the link array

    foreach($xml->getElementsByTagName('a') as $link) {

        $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);

    }

 

    //Return the links

    return $links;

}

?>

down

tloach at gmail dot com ¶

14 years ago

For anyone else who has been having issues with formatOuput not working, here is a work-around:



rather than just doing something like:



<?php

$outXML = $xml->saveXML();

?>



force it to reload the XML from scratch, then it will format correctly:



<?php

$outXML = $xml->saveXML();

$xml = new DOMDocument();

$xml->preserveWhiteSpace = false;

$xml->formatOutput = true;

$xml->loadXML($outXML);

$outXML = $xml->saveXML();

?>

down

evert at er dot nl ¶

13 years ago

A nice and simple node 2 array I wrote, worth a try ;) 



<?php

function getArray($node)

{

    $array = false;



    if ($node->hasAttributes())

    {

        foreach ($node->attributes as $attr)

        {

            $array[$attr->nodeName] = $attr->nodeValue;

        }

    }



    if ($node->hasChildNodes())

    {

        if ($node->childNodes->length == 1)

        {

            $array[$node->firstChild->nodeName] = $node->firstChild->nodeValue;

        }

        else

        {

            foreach ($node->childNodes as $childNode)

            {

                if ($childNode->nodeType != XML_TEXT_NODE)

                {

                    $array[$childNode->nodeName][] = $this->getArray($childNode);

                }

            }

        }

    }



    return $array;

}

?>

down

biker dot mike at gmx dot com ¶

7 years ago

Look out for the following gotcha when loading XML from a string:

<?php
$doc = new \DOMDocument;
$doc->documentURI = $myXmlFilename;
$doc->loadXML($myXmlString);
?>

documentURI is now set to the value of $myXmlFilename, right?

Wrong!

It's set to the current working directory.  If you want to manually set documentURI to something other than the CWD, do so AFTER the call to loadXML().

E.g.:
<?php
$doc = new \DOMDocument;
$doc->loadXML($myXmlString);
$doc->documentURI = $myXmlFilename;
?>

documentURI really is now set to the value of $myXmlFilename.

down

Nick M ¶

12 years ago

You may need to save all or part of a DOMDocument as an XHTML-friendly string, something compliant with both XML and HTML 4. Here's the DOMDocument class extended with a saveXHTML method:

<?php

/**
 * XHTML Document
 *
 * Represents an entire XHTML DOM document; serves as the root of the document tree.
 */
class XHTMLDocument extends DOMDocument {

  /**
   * These tags must always self-terminate. Anything else must never self-terminate.
   * 
   * @var array
   */
  public $selfTerminate = array(
      'area','base','basefont','br','col','frame','hr','img','input','link','meta','param'
  );
  
  /**
   * saveXHTML
   *
   * Dumps the internal XML tree back into an XHTML-friendly string.
   *
   * @param DOMNode $node
   *         Use this parameter to output only a specific node rather than the entire document.
   */
  public function saveXHTML(DOMNode $node=null) {
    
    if (!$node) $node = $this->firstChild;
    
    $doc = new DOMDocument('1.0');
    $clone = $doc->importNode($node->cloneNode(false), true);
    $term = in_array(strtolower($clone->nodeName), $this->selfTerminate);
    $inner='';
    
    if (!$term) {
      $clone->appendChild(new DOMText(''));
      if ($node->childNodes) foreach ($node->childNodes as $child) {
        $inner .= $this->saveXHTML($child);
      }
    }
    
    $doc->appendChild($clone);
    $out = $doc->saveXML($clone);
    
    return $term ? substr($out, 0, -2) . ' />' : str_replace('><', ">$inner<", $out);

  }

}

?>

This hasn't been benchmarked, but is probably significantly slower than saveXML or saveHTML and should be used sparingly.

down

fcartegnie ¶

14 years ago

Be careful with formatOutput().

Creating an empty node like this:
createElement('foo','')
instead of
createElement('foo')
will break formatOutput.

down

cmyk777 at gmail dot com ¶

14 years ago

This function may help to debug current dom element:

<?php
function dom_dump($obj) {
    if ($classname = get_class($obj)) {
        $retval = "Instance of $classname, node list: \n";
        switch (true) {
            case ($obj instanceof DOMDocument):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->saveXML($obj);
                break;
            case ($obj instanceof DOMElement):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
                break;
            case ($obj instanceof DOMAttr):
                $retval .= "XPath: {$obj->getNodePath()}\n".$obj->ownerDocument->saveXML($obj);
                //$retval .= $obj->ownerDocument->saveXML($obj);
                break;
            case ($obj instanceof DOMNodeList):
                for ($i = 0; $i < $obj->length; $i++) {
                    $retval .= "Item #$i, XPath: {$obj->item($i)->getNodePath()}\n".
"{$obj->item($i)->ownerDocument->saveXML($obj->item($i))}\n";
                }
                break;
            default:
                return "Instance of unknown class";
        }
    } else {
        return 'no elements...';
    }
    return htmlspecialchars($retval);
}
?>

Example usage:

<?php
$dom = new DomDocument();
$dom->load('test.xml');
$body = $dom->documentElement->getElementsByTagName('book');
echo '<pre>'.dom_dump($body).'<pre>';
?>

Output:

Instance of DOMNodeList, node list: 
Item #0, XPath: /library/book[1]
<book isbn="0345342968">
<title>Fahrenheit 451</title>
<author>R. Bradbury</author>
<publisher>Del Rey</publisher>
</book>
Item #1, XPath: /library/book[2]
<book isbn="0048231398">
<title>The Silmarillion</title>
<author>J.R.R. Tolkien</author>
<publisher>G. Allen &amp; Unwin</publisher>
</book>
Item #2, XPath: /library/book[3]
<book isbn="0451524934">
<title>1984</title>
<author>G. Orwell</author>
<publisher>Signet</publisher>
</book>
Item #3, XPath: /library/book[4]
<book isbn="031219126X">
<title>Frankenstein</title>
<author>M. Shelley</author>
<publisher>Bedford</publisher>
</book>
Item #4, XPath: /library/book[5]
<book isbn="0312863551">
<title>The Moon Is a Harsh Mistress</title>
<author>R. A. Heinlein</author>
<publisher>Orb</publisher>
</book>

down

610010559 at qq dot com ¶

2 years ago

when you add the new element to formatted XML data through appendChild() method, you would the new element you add is not be formatted(that is not indexed, not line break).  here is my solution (in short load the xml without preserve white space, ), example show as below:
<?php
$doc = new \DOMDocument();
$doc->formatOutput = true;
$doc->preserveWhiteSpace = false;//that is key, default value is true. 
$doc->loadXML($xmlStr);
$doc->appendChild($doc->createElement('php', '666'))
$formattedXMLStr = $doc->saveXML();//DOMDocument wold format the xml str for you
echo $formattedXMlStr;
?>
it take me some time to try it out. hope it save your time.

down

sites.sitesbr.net ¶

11 years ago

How to objetify a DomDocument with hierarchy like:
<root>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
    <item>
          <prop1>info1</prop1>
          <prop2>info2</prop2>
          <prop3>info3</prop3>
     </item>
</root>

It's possible to use in object style to retrieve information, as:

<?php
     $theNodeValue = $aitem->prop1;
?>

Here is the code: one Class and 2 functions.

<?php
 class ArrayNode{
       public $nodeName, $nodeValue;
 }

 function getChildNodeElements( $domNode ){
     $nodes = array();
     for( $i=0; $i < $domNode->childNodes->length; $i++){
       $cn = $domNode->childNodes->item($i);
       if( $cn->nodeType == 1){
           $nodes[] = $cn;
           }
     }
    return $nodes;
 }

 function getArrayNodes( $domDoc ){
     $res = array();

       for( $i=0; $i < $domDoc->childNodes->length; $i++){
       $cn = $domDoc->childNodes->item($i);
       # The first is the root tag...
          if( $cn->nodeType == 1){
               # But we want it's childNodes.
                $sub_cn = getChildNodeElements( $cn);
                # Found the tagName:
                $baseItemTagName = $sub_cn[0]->nodeName;
                break;
            }
        }

       $dnl = $domDoc->getElementsByTagName( $baseItemTagName);

       for( $i=0; $i< $dnl->length; $i++){
          $arrayNode = new ArrayNode();

      # Summary
      $arrayNode->nodeName = $dnl->item($i)->nodeName;
      $arrayNode->nodeValue = $dnl->item($i)->nodeValue;

      # Child Nodes
      $cn = $dnl->item($i)->childNodes;
      for( $k=0; $k<$cn->length; $k++){
           if( $cn->item($k)->nodeName == "#text" && trim($cn->item($k)->nodeValue) == "") continue;
           $arrayNode->{$cn->item($k)->nodeName} = $cn->item($k)->nodeValue;
      }

      # Attributes
      $attr = $dnl->item($i)->attributes;
      for( $k=0; $k < $attr->length; $k++){
           if(! is_null($attr)){
            if( $attr->item($k)->nodeName == "#text" && trim($attr->item($k)->nodeValue) == "") continue;
            $arrayNode->{$attr->item($k)->nodeName} = $attr->item($k)->nodeValue;
           }
      }

      $res[] = $arrayNode;

       }

     return $res;
 }
?>

To use it:

<?php

  # First you load a XML in a DomDocument variable.

   $url = "/path/to/yourxmlfile.xml";
   $domSrc = file_get_contents($url);
   $dom = new DomDocument();
   $dom->loadXML( $domSrc );

  # Then, you get the ArrayNodes from the DomDocument.

    $ans = getArrayNodes( $dom );

 
    for( $i=0; $i < count( $ans ) ; $i++){

    $cn =  $ans[ $i];

    $info1 =  $cn->prop1;
    $info2 =  $cn->prop2;
    $info3 =  $cn->prop3;
      
         // ...
 
   }

?>

down

pastormontesinos at gmail dot com ¶

3 years ago

For using safely with script nodes when parsing, best option is extending DOMDocument, keeping script tags while DOMDocument process and rearrange them just after saveHTML function is called. Here is my custom class.

<?php 

class SafeDOMDocument extends \DOMDocument
{
    const REGEX_JS            = '#(\s*<!--(\[if[^\n]*>)?\s*(<script.*</script>)+\s*(<!\[endif\])?-->)|(\s*<script.*</script>)#isU';
    const SUBSTITUTION_FORMAT = '<!--<script class="script_%s"></script>-->';
    private $matchedScripts = [];

    public function loadHTML($source, $options = 0)
    {
        $this->formatOutput        = false;
        $this->preserveWhiteSpace  = true;
        $this->validateOnParse     = false;
        $this->strictErrorChecking = false;
        $this->recover             = false;
        $this->resolveExternals    = false;
        $this->substituteEntities  = false;
        $matches = [];
        $success = preg_match_all(self::REGEX_JS, $source, $matches);

        if ($success && !empty($matches)) {
            foreach ($matches[0] as $match) {
                $storedScript = rtrim(ltrim($match, "\n\r\t "), "\n\r\t ");
                $scriptId = md5($storedScript);
                $key = sprintf(self::SUBSTITUTION_FORMAT, $scriptId);
                $source = str_replace($match, $key, $source);
                $this->matchedScripts[$key] = $storedScript;
            }
        }

        return parent::loadHTML($source, $options);
    }

    public function saveHTML(DOMNode $node = null)
    {
        $output = parent::saveHTML($node);

        if (count($this->matchedScripts)) {
            foreach ($this->matchedScripts as $substitution => $originalSnippet) {
                $output = str_replace($substitution, $originalSnippet, $output);
            }
        }

        return $output;
    }
}
?>

down

ashjkshdu283 at gmail dot com ¶

5 years ago

/* Function evolved from jay at jaygilford dot com post
  * This function will return an array of the values of the specified
  * attribute ($attr) for all the Dom Document object's elements 
  */

<?php

function getAttrData(string $attr, DomDocument $dom) { 
    // Empty array to hold all classes to return 
    $attrData = array(); 

    //Loop through each tag in the dom and add it's attribute data to the array 
    foreach($dom->getElementsByTagName('*') as $tag) {
        if(empty($tag->getAttribute($attr)) === false) {
            array_push($attrData, $tag->getAttribute($attr));
        }
    } 

    //Return the array of attribute data
    return array_unique($attrData); 
}

$html = '
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>
<a href="#someLink" id="someLink" class="link-class">Some Link</a>
<a href="#someOtherLink" id="someOtherLink" class="link-class">Some Other Link</a>
<h1 id="header1" class="header-class">My First Heading</h1>
<p id="para1" class="para-class">My first paragraph.</p>
</body>
</html>';
$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->saveHTML();
var_dump(getAttrData('class', $dom));

down

ingjetel at gmail dot com ¶

8 years ago

Easy function for basic output of XML file via DOM parsing

<?php
$dom = new DomDocument();
$dom->load("./file.xml") or die("error");
$start = $dom->documentElement;
fc($start);

function fc($node) {
  $child = $node->childNodes;
  foreach($child as $item) {
    if ($item->nodeType == XML_TEXT_NODE) {
      if (strlen(trim($item->nodeValue))) echo trim($item->nodeValue)."<br/>";
    }
    else if ($item->nodeType == XML_ELEMENT_NODE) fc($item);
  }
}
?>

down

-2

admin at beerpla dot net ¶

14 years ago

After seeing many complaints about certain DOMDocument shortcomings, such as bad handling of encodings and always saving HTML fragments with <html>, <head>, and DOCTYPE, I decided that a better solution is needed.

So here it is: SmartDOMDocument. You can find it at http://beerpla.net/projects/smartdomdocument/

Currently, the main highlights are:

- SmartDOMDocument inherits from DOMDocument, so it's very easy to use - just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).

- saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want - it saves HTML without adding that extra garbage that DOMDocument does.

- encoding fix - DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you - just use loadHTML() as you would normally.

- SmartDOMDocument Object As String - you can use a SmartDOMDocument object as a string which will print out its contents.
For example:
<?php
echo "Here is the HTML: $smart_dom_doc";
?>

I'm going to maintain this code and try to fix bugs as they come in.

Enjoy.

down

-2

danny dot nunez15 at gmail dot com ¶

10 years ago

A simple function to grab all links in a page. 

    function get_links($url) {

        // Create a new DOM Document to hold our webpage structure 
        $xml = new DOMDocument();

        // Load the url's contents into the DOM 

        $xml->loadHTMLFile($url);

        // Empty array to hold all links to return 
        $links = array();

        //Loop through each <a> tag in the dom and add it to the link array 
        foreach ($xml->getElementsByTagName('a') as $link) {
            $url = $link->getAttribute('href');
            if (!empty($url)) {
                $links[] = $link->getAttribute('href');
            }
        }

        //Return the links 
        return $links;
    }

down

-6

PhilipWayneRollins at gmail dot com ¶

14 years ago

If you want to use the DOMDocument to create xHTML documents here is a simple class

Note this is designed for creating xHTML documents from scratch but could be easily extended to work with xHTML documents. Also this is for xHTML not XML.

<?php
    class Document
    {
        public $doctype;
        public $head;
        public $title = 'Sensei Ninja';
        public $body;
        private $styles;
        private $metas;
        private $scripts;
        private $document;
        
        
        function __construct (  )
        {
            $this->document = new DOMDocument( );
            $this->head = $this->document->createElement( 'head', ' ' );
            $this->body = $this->document->createElement( 'body', ' ' );
        }
        
        
        public function addStyleSheet ( $url, $media='all' )
        {
            $element = $this->document->createElement( 'link' );
            $element->setAttribute( 'type', 'text/css' );
            $element->setAttribute( 'href', $url );
            $element->setAttribute( 'media', $media );
            $this->styles[] = $element;
        }
        
        
        public function addScript ( $url )
        {
            $element = $this->document->createElement( 'script', ' ' );
            $element->setAttribute( 'type', 'text/javascript' );
            $element->setAttribute( 'src', $url );
            $this->scripts[] = $element;
        }
        
        
        public function addMetaTag ( $name, $content )
        {
            $element = $this->document->createElement( 'meta' );
            $element->setAttribute( 'name', $name );
            $element->setAttribute( 'content', $content );
            $this->metas[] = $element;
        }
        
        
        public function setDescription ( $dec )
        {
            $this->addMetaTag( 'description', $dec );
        }
        
        
        public function setKeywords ( $keywords )
        {
            $this->addMetaTag( 'keywords', $keywords );
        }
        
        public function createElement ( $nodeName, $nodeValue=null )
        {
          return $this->document->createElement( $nodeName, $nodeValue ); 
        }
        
        public function assemble ( )
        {
            // Doctype creation
            $doctype = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML TRANSITIONAL 1.0//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
            
            // Create the head element
            $title = $this->document->createElement( 'title', $this->title );
            // Add stylesheets if needed
            if ( is_array( $this->styles ))
                foreach ( $this->styles as $element )
                    $this->head->appendChild( $element );
            // Add scripts if needed
            if(  is_array( $this->scripts ))
                foreach ( $this->scripts as $element )
                    $this->head->appendChild( $element );
            // Add meta tags if needed
            if ( is_array( $this->metas ))
                foreach ( $this->metas as $element )
                    $this->head->appendChild( $element );
            $this->head->appendChild( $title );
            
            // Create the document
            $html = $this->document->createElement( 'html' );
            $html->setAttribute( 'xmlns', 'http://www.w3.org/1999/xhtml' );
            $html->setAttribute( 'xml:lang', 'en' );
            $html->setAttribute( 'lang', 'en' );
            $html->appendChild( $this->head );
            $html->appendChild( $this->body );
            
            
            $this->document->appendChild( $html );
            return $doctype . $this->document->saveXML( );
        }
        
    }
    
?>

Small example

<?php
        $document = new Document( );
    $document->title = 'Hello';
    $document->addStyleSheet( 'StyleSheets/main.css' );
    $div = $document->createElement( 'div' );
    $div->nodeValue = 'Hello, world!';
    $div->setAttribute( 'style', 'color: red;' );
    $document->body->appendChild( $div );
    printf( '%s', $document->assemble( ) );
?>

down

-5

qrworld.net ¶

9 years ago

In this post http://softontherocks.blogspot.com/2014/11/descargar-el-contenido-de-una-url_11.html I found a simple way to get the content of a URL with DOMDocument, loadHTMLFile and saveHTML().

function getURLContent($url){
    $doc = new DOMDocument;
    $doc->preserveWhiteSpace = FALSE;
    @$doc->loadHTMLFile($url);
    return $doc->saveHTML();
}

＋add a note