html vs markdown #19

Closed
opened 2021-08-31 17:18:04 +02:00 by chris · 2 comments
Owner

Markdown is rouge CSS safe but the text is not included in the Search.

HTML is included in the search, but CSS can get our of hand.

Need to study HTML block more closely.

Markdown is rouge CSS safe but the text is not included in the Search. HTML is included in the search, but CSS can get our of hand. Need to study HTML block more closely.
Author
Owner
Might be a solution https://omeka.org/s/docs/developer/modules/page_blocks/#getfulltexttext
Author
Owner

Some progress.

Our Markdown module is now searchable!! So that's good. :)

I also have an HTML module hack that removes unwanted styling.

This is horrible.

./application/src/Site/BlockLayout/Html.php

use DOMDocument;


    public function onHydrate(SitePageBlock $block, ErrorStore $errorStore)
    {
        $data = $block->getData();
        $html = isset($data['html']) ? $this->htmlPurifier->purify($data['html']) : '';
        // archive strip tags
        $html = strip_tags($html, '<p><br><a><ol><ul><li>');
        // archive remove attributes
        $sanitized_html = "";
        $document = new DOMDocument();
        $document->loadHTML($html);
        $paragraphs = $document->getElementsByTagName('p');
        foreach ($paragraphs as $paragraph) {
          $paragraph->removeAttribute('class');
          $paragraph->removeAttribute('style');
          $paragraph->removeAttribute('align');
          $sanitized_html = $sanitized_html .PHP_EOL. $document->saveHTML($paragraph);
        }
        $data['html'] = $html.PHP_EOL;
        $block->setData($data);
    }

To be continued..

Some progress. Our Markdown module is now searchable!! So that's good. :) I also have an HTML module hack that removes unwanted styling. This is horrible. `./application/src/Site/BlockLayout/Html.php` ``` use DOMDocument; public function onHydrate(SitePageBlock $block, ErrorStore $errorStore) { $data = $block->getData(); $html = isset($data['html']) ? $this->htmlPurifier->purify($data['html']) : ''; // archive strip tags $html = strip_tags($html, '<p><br><a><ol><ul><li>'); // archive remove attributes $sanitized_html = ""; $document = new DOMDocument(); $document->loadHTML($html); $paragraphs = $document->getElementsByTagName('p'); foreach ($paragraphs as $paragraph) { $paragraph->removeAttribute('class'); $paragraph->removeAttribute('style'); $paragraph->removeAttribute('align'); $sanitized_html = $sanitized_html .PHP_EOL. $document->saveHTML($paragraph); } $data['html'] = $html.PHP_EOL; $block->setData($data); } ``` To be continued..
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: arcHIVE-tech/fixes#19
No description provided.